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I. INTRODUCTION 


Around 1964 a new term appeared in the computer litera- 
ture to denote a new concept. The term was "database," and 
it was gcing to play a highly significant role in an organi- 
zation's information systen. The information systen 
supports the organization's functions, maintaining the data 
for these functions and assisting users to interpret the 
data for decision making. The database becomes an important 
tool in this process; it is the container of the data in the 
information systen. 

Tn many information systems, database denotes collec- 
tions of data shared ry end-users of computer systems. Users 
who make decisions obtain data by accessing the database and 
then recording their deciSion in it. EaSy access to a 
variety of data fret a number of locations enables the 
information system to quickly respond to the needs cf deci- 
Sion makers within the organization, whereas poor access can 
of course hinder rapid response. If the data are not 
readily available, decisions may be either delayed unneces- 
Sarily or made with incomplete data, leading to fossible 
system malfunction in the future. 

PMicerlexiombity Of the database structures iS a very 
important feature to meet changing organizational needs. As 
new functions arise in an organization, new decisions fcllow 
in their wake. Since the database will need to store hew 
data and accommodate new relationships to support the hew 
decisions, it must include facilities to allow such changes 
to be easily made. [Hef. 1:spp. 1-3] 

feday, computer applications in which many users at 
terminals concurrently access a database are called "data- 


base Sapelications” ~[Ref—. 2]. A Significant new kind of 
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software, the datarase management system, or DBMS, has 
evolved to facilitate the development of database applica- 
tions. The development of DBMS, in turn, has given rise to 
new languages, algorithms, and software technigues which 
together make up what might be called a database technology. 

Datakase technolcyy has been driven by, and toa large 
extent distinguished from other software technologies by the 
following broad user requirements. 

-Data consolidaticn 
- Data independence 
, -Data protection 

In the years’ ahead, database systems will become 
increasingly widespread and increasingly important. At 
present, however, they represent a new and relatively unexp- 
lored field, despite the fact that the number of systems 
installed or under development is growing rapidly. 

The primary goal cf this thesis is to present the design 
steps of a particular database system, design criteria, and 
the elements of the database system which provide designers 
with the ability tc evaluate databases against these 
criteria. The second cbjective of this thesis is to show the 
implementation of that database system which controls and 
executes the transactions written ina model-based database 
language such as Data Definition Language (DDL) and Data 
Manipulation Language (DML). Finally, the third objective of 
this study is to introduce essential features of the mtain- 
tainakility, administration, and security of a database 
Management systen. 

Chapter II describes the Lkasic concepts of datakase, 
including the definition of a DBS, its components, jts 
architecture, and some advantages/disadvantages. Chapter IilI 
Eriefly reviews the design objectives and techniques of a 
database and describes logical and physical database design. 


Chapter IV also briefly addresses database models which can 
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be used to form a logical framework of a database and to 
Support further design phases and/or to create intended 
database structure which will be implemented after the 
completicn of design phases. Chapter WV introduces, in 
detail, the Semantic Database Model for a personnel assign- 
ment datakase. Chapters VI and VII describe the design of 
the personnel database by using the Relational Database 
Model approach which is one of the three database models. In 
@ilaxtionsrules, design criteria, and important operaticns 
associated with this rodel-are given. Chapter VII also shows 
how the designer can transform the SDM model which has been 
designed for a perscnnel datakase system into a relationai 
datakase model. The INGRES Database Management System which 
is available today is discussed in Chapter VIII. Chapter Ix 
demonstrates the implementaticn of the relational database 
syster which is implemented on the VAX computer systems by 
uSing the ORACLE Relational DBMS. Chapter X descrictes the 
functicns of a DBMS, such as security features, maintain- 
ability, and concurrent processing control. Finally, conciu- 
Sions and recommendations Eased on our research are 


- presented in Charter XI. 


is 


II. BASIC CONCEPTS OF DATABASE 


a a —— > ee > a Soa 


A. DEFINITION OF A LATABASE SYSTEM 


The simplest definition of a database might be thata 
database is a collection of facts or a repository for stored 
data which is both integrated and shared. By "integrated" 
we mean that the datakase may be considered as a unification 
of several otherwise distinct data files, with any redun- 
dancy among those files partially or wholly eliminated. By 
"shared" we mean that individual pieces of data in the data- 
base may be shared among several different users, in the 
sense that each of those users may have access to the same 
piece of data. The term "shared" is also extended to cover 


concurrent sharing: that is, the ability for several users 


to be accessing the database at the same tine. {Ref. 3:pp. 
3-7 j 

RW. Engles [Ref. 4] refers to the data in a database as 
"operaticnal data," distintdguishing (ect con, apt dae 
output data, and other kinds of data. Thus, a modified 


version cf Engles’ original definition of database is that a 
database is a collection of stored operational data used by 
the application systems of some particular enterprise. 
"Enterprise" is simply a convenient generic term for any 
reasonakly self-contained commercial, scientific, technical, 
cr other organizaticn. Any enterprise must necessarily 
Maintain a large amount of data about its operation. This is 
its “cperational data," such as product data, account data, 
Military personnel data etc. 

In recent years, technology improved to the point where 
it became feasible te design, build, and operate large-scale 


collections of caer el A computer environment. In other 
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words, organizations realized that data were a valuatle 
resource and needed to be centrally managed. The Gonce picon 
a database has thus emerged fully only in recent years. A 
database can also be defined as a computerized collecticn of 
stored operational data that serves the needs of rultifle 
users within one or more organizations [Ref. S:pp. 3-17}. A 
key point is that the database is an integrated resource to 
re used Ey all members of the organizations who need infor- 
fatren CONtained in it. 

Since the database is an integrated and shared resource 
Seo muti ple = users within an organization, it shculd be 
managed for the orgarization's benefit and from its view- 
BOLnt, not by individual users. Thus~ge two. addityonal 
concepts have been developed to solve the proklem of 
Ponce rel ling and managing the organization's database 
resource. Initially, software was developed to provide a 
common interface between all users and the integrated data- 
Fase. A common interface promotes privacy and data integ- 
rity. Also, users cannot store information implicitly and 
must use and modify data in a manner consistent with the 
organization's viewpcint. The software, known as a database 
Management system, allows computer control of the data 
resource. A database management system (DBMS) 1s a collec- 
tion of software tools and access methods which enables the 
users to store facts about real-world objects and the rela- 
tionships between thcse objects, and to manipulate those 
facts by issuing gueries in content-addressable fcrm. [In 
short, a DEMS is a generalized tool for manipulating a data- 
base) [| Ref. S:pp. 3-17] ; it is made available through 
special software for the interrogation, Maintenance, and 
analysis of data. 

The second concert is that of the database administrator 
(DBA). The DBA can be thought of as one or more individuals, 


possifly aided by a staff, whe manage the organization's 
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database resource [Ref. 5:pp. 3-17], or are responsible TO 
overall control of the database system. THe DBA*sS Lesponei— 
bilities include the following [Ref£. 3:pp. 25~26]. 
-Deciding the infcrmation content of the database. 
~Deciding the stcrage structure and access strategy. 
~-Liaising with users. 
-Defining authorization checks and validation procedures. 
-Defining a strategy for backup and recovery. 
- Monitoring performance and responding to changes in 
requirements. 

We can clearly see the reason why an organizaticn should 
choose tc store its operational data in an integrated data- 
base. A database system provides the organization with 
centralized control of its operational data which is one of 
its mest valuable assets. This iS in sharp contrast to the 
Situation that prevailS in Many organizations today, where 
typically each application has its own private files so that 
the operational data is widely dispersed, and is therefore 
probably ditt teutey ec compenon. 


Ee. COMEFCHENTS OF A [ATABASE SYSTEM 


A database system consists of four major components: 
data, hardware, software, and users. Fig 2.1 shows a greatly 


Simplified view of the major components of a database 


systen. 
1. Data 
The data stcred in the system 1S partitioned into 
che or more databases. For tutorial purposes it is usually 


convenient to assume that there is just one datakase, 
containing the totality of all stored data in the systen. 
According to Standard usage in the computer 


industry, bits are grouped into bytes or characters, 
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Figume 2.1 Siuplified View of a Database Systen. 

characters are grouped into fields, and fields are grouped 


jeer rececrds, A collection of records is called a file. At 
mis LOlREyeWe cannot say that a database is a collecticn of | 
fumes. A database is a collection of "integrated" files and 
relationships among records in those files. Database 
processing differs frem file processing in which the sStruc- 
ture of the files is distriktuted across the application 
programs and each file is considered to exist independently. 
Cn the other hand, the dataktase is self-describing Eecause 
[maeontainsc ewathin itself, .a description of its structure. 
Anotner difference tretween file processing and database 
concerns the term file. For file processing, the records in 
a file are usually grcuped together physically. For database 
processing, the logical collection of records protarly does 
not exist as a physical collection. In database processing, 
there are logical files, or collections of records having 
Meaning to users, and phySical files, or collections of 


records cn phySical devices. 


2. Hardware 


In general, database applications do not require 
special hardware. It consists of direct access stcrage (or 
secondary storage) devices (disks, drums, etc.) on wnaich the 
database resides, together with the associated devices, 
control units, channels, and so forth. It is assumed that 
the datakase is toc large to be stored inits entirety 
within the computer's primary storage. 

in toe ae a new term appeared and several vendors 
announced new products called "database machines" [Ref. 6:p. 
8 js These machines are special purpose computers that 
perform database processing functions. ACGOLGING tCOuetars 
type cf architecture, the main frame or host computer sends 
requests for service and data cver a channel to the database 
machine. The machine processes the requests and _ sends 
results, messages, or data back to the main computer. Thus 
database processing can be performed Simultaneously with 
applications processing. The actual effectiveness of such 
machines is under investigation. If substantial processing 
efficiencies can be froved at a reasonable cost, then data- 
base machines will become important. Hardware aspects of the 
System £f£crm a major topic in their own right; the frotlems 
encountered in this area are not peculiar to database 
systems, and those prcblems have been very thoroughly inves- 
tigated and documented elsewhere. Thus, this thesis is net 


concerned with nardware aspects of the systen. 
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The database management system or DBMS is a layer of 
software which provides the interface between the fhysical 
datakrase itself (i1.e., the data as actually stored) and the 
users of the system. Ail requests from users for access to 


the datakase are handled by the DBMS. One general function 
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provided by the DBMS is the separation of database users 
from hardware-level detail. In other words, the DEMS 
provides a view of the database that is elevated somewhat 
above the hardware level, and supports user operations (such 
as "get the OFFICER record for officer Buyukoner") that are 
expressed in terms of that higher-level view. This function, 
and other functions of the DBMS, will be discussed in detail 
later. 

Two types of programs involved in database 
processing are the Operating System (0S) and Communications 
Sores Frogram  ({CCPy- The operating system is a set of 
programs which controls the computer's resources. In a 
sense, the OS can be viewed as the glue that holds all of 
the other programs together. Communications control prcgran 
(ee P) performs ccmmunications-oriented tasks. On- Prine 
processing requests cr transactions are provided by users at 
terminals. The requests are received and routed by the CCP 


cver Communications lines. The CCP has several itportant 


auNet LOns: provides communications error detection and 
correcticn, manages terminal activity, routes messages to 
the correct next destination, and formats messages for 
various types of terminal equipment. The CCP alse routes 


on-line input to the next level of programs which contains 
application programs and database utilities. The oferating 
system and the CCP will not be discussed further in this 


thesis. 
4. Users 


There are three broad classes of user being consid- 


ered: application programmers, end-users, and the database 
administrator (DBA). [Ref. 3:p. 6} 
The application programmer is responsitle for 


Woiting application programs that use the database, typi- 


cally in a high-level language such as COBOL or PL/I. these 
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application programs are used with the data for retrieving 
information, creating new information, and deleting of 
changing existing information. The programs themselves ha y 
be conventional batch applications, or they may be “on-line" 
programs that are designed to support an end-user inter- 
acting with the syste from an on-line terminal. 

' The end-user Cahn access the database nes key ieee 
terminal. An end-user may, in general, perform all the func- 
tions of retrieval, creation, deletion, and modificaticn by 
employing a query language provided as an integral part of 
the systen, or by invoking a user-written application 
program that accepts commands from the terminal and in turn 
lssues reguests to the DBMS on the end-user's behalf. 

The database administrator, or DBA mentioned earlier 
in this Chapter, is the person (or group of persons) respon- 
Sible for overall control of the database system. The func- 
tion of the DBA staff is to serve aS a protector of the 
database and as a focal point for resolving users’ 


conflicts. 


C. ADVANTAGES AND DISADVANTAGES OF DATABASE PROCESSING 


The main advantage of datakase processing is included in 
its definition given previously. Integrated and shared data 
offers those important advantages. Database processing 
allows mcre informaticn to be produced from a given amount 
of data. Secondly, the amount of redundancy in stored data 
can be minimized. In other words, the elimination or reduc- 
tion of data duplication allows data to only be stored cnce. 
As a result, this saves file space, and to some extent, can 
reducé ;rocessing requirements. PRei.s "es pp. 3-6", and 
[Ref. V:pp. 16 ] 

As mentioned earlier, centralized control of the ofera- 
tional data in a datakase provides the following advantages 
{Refs 33cps 0-27. 
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This 1s reaily a corollary of the above point. Ifa 
given fact about the real world is represented by two 
different entries in the database and the redundancy is not 
contrclled, then there will be some occasions for which the 
two entries will not agree (that is, when only one has been 
updated). At such times the database is said to he inconsis- 
tem. In this case, the database produces incorrect or 
Conn licting anformaticn. If the redundancy is controlled, 
then the system could guarantee that the database is never 
inconsistent as seen by the user, by ensuring that any 
change made to either of the two entries is automatically 
made to the other. This process is known as propagating 
updates (the term "update" is used to cover all the opera- 


tions or creation, deletion, and modification). 
2. Shared 


The concept of shared data was discussed in Section 


3. Enforcement cf Standards 


The applicable standards, which may include any or 
all of the following: installation, company, industry, and 
national standards, are followed in the frepresentaticn of 
the data. Standardizing stored data formats assists in data 


interchange or migration between systems. 
Moteelication cL Security Restrictions 


The DBA can define authorization checks tc be 
Carried out whenever access to sensitive data is attempted 


(see Chapter X for more detail). 
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aS aintenance of Data 


The problem cf integrity is the problem of ensuring 
that the data in the database 1s accurate. Inconsistency 
between two entries representing the same "fact" leads toa 
jack cf data integrity (which can occur only if redundancy 
exists in the stored data). It is essential to point out 
that data integrity is even more important in a database 
system than in a "private files" environment, because the 
datakase is Shared. Centralized control of the database 


Supports data integrity. 


6. Balancing of Conflicting Requirements 


Knowing the overall reguirements of the enterfrise, 
the DBA can structure the datakase system to provide an 
overall service that is "best for the enterprise." 

The cost of database processing may become a major 
disadvantage. It can ke expensive. The DBMS may need so much 
primary storage that additional storage must be purchased. 
Even with more storage, it may get exclusive control of the 
CPU, thus forcing the user to upgrade to a more powerful 
CON Pir Clee ekcien O - Dram oon) 

Once the database is implemented, operating costs 
for scme systems will be higher. For example, sequential 
processing will never be done as fast in the database envi- 
ronment, since it causes excessive overhead. 

Large amounts of data in different formats can be 
interrelated in the database. Both the database system and 
the application programs must be able to process’ these 
structures. This requires more sophisticated programming, 
takes time, and requires highly skilled programming 
personnel. Thus, the complexity is another important disad- 
vantage of database processing. Backup and recovery also 


increases complexity and are more difficult in the database 
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environment to carry out. Another reason for this is that 
databases are often processed by multiple users ccncur- 
rently. Determining the exact state of the database at the 
time cf failure may be a problem. Given that, it may be even 
more difficult to determine what should be done next. 
Another disadvantage is that integration, and hence 

centralization, increases vulnerability. A failure in one 
component of an integrated system can cause the entire 
System to fail. This event is especially critical if the 
operation of the user organization depends on the database. 

To avoid these potential drawbacks a database 
Management system (DBMS) should satisfy the following ckjec- 
tives [Ref. Jspp. 13-14] : 

-Different functions of an enterprise can be served 
effectively by the same DBMS. 

»-Redundancy in stcred data can be minimized. 

-Consistent information can be supplied for the decision- 
making process. 

~security controls can be applied. 

-Application programs can be developed, maintained, and 
enhanced faster and more economically, with fewer 
skilled personnel. 

-Physical reorganization of the stored data 1s easy. 

-Céentralized contrcl of the database is possible. 

2oasler procedures for computer operations can be 
estaklished. 


D. AN ARCHITECTURE FCR A DATABASE SYSTEM 


An architecture fcr a dataktase system is illustrated in 
Ege 2-2 | Ref. Be. 207e This picture presents a framework 
which is extremely useful for describing general database 
concepts and for explaining the structure of individual 


systems, and it is ir broad agreement with that proposed Dy 


es 


the ANSI/SPARC Study Group on Data Base Management Systems 
[Ref. 8]. 

The architecture is divided into three general levels: 
internal, conceptual, and external. Generally speaking, the 
external level is the one closest to the users; that is, the 
one ccncerned with the way in which the data is viewed by 
individual users. The internal level is the level clcesest to 
physical storage; that is, the one concerned with the way in 
which the data are actually stored. The conceptual ievel is 
a bridge or "level of indirection" between the cther two. 
There may be many “external views," each consisting of a 
more cr less user oriented logical representation cf scme 
portion of the database (Such as logical records and 
fields), and there may be a single "conceptual view," 
consisting of a sigilarly logical representation of the 
entire database. Likewise, there will be a single "internal 
view," representing the total database as actually stored. 

The three levels are also defined as levels of abstrac- 
tion and named in the specification of a database structure: 
the ccncerptual or enterj;rise administrator view, the imple- 
mentation view of the applications programmer or end uSer, 
and the physical view of the systems progranmer/analyst 
[Ref. 5:pp. 3-177]. The external elev eeee once )tualet evel, 
and internal level in the ANSI/SPARC model correspond tc the 
implementation level, conceptual level, and physical level 
in the levels of abstraction, respectively. Figure 2.3 
Shows these three levels of abstraction and Some of their 
primary components. 

It should be obvicus that between the computer, dealing 
with bits, and the ultimate user, dealing with abstractions 
such as pilitary units or assignment of personnel to a divi- 
sion, there will be many levels of abstraction. It should be 
emphasized that only the dataktase actually exists at the 
physical level. We may view the physical database itself at 
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Figure 2.3 Levels of Abstraction in a Database System. 


emphasizes that a view (or external view) is an abstract 
model of a portion of the conceptual database. As an example 
of the utility of views, the army may provide a computerized 
personnel assignment department, consisting of data anda 
collection of programs that deal with officers and military 
units. These programs, and the people who use them, dc not 


require kncwledge concerning fersonnel files or the assign- 


ment cf officers to units. The personnel department may 
need to know about assignnents, units, and aspects of the 
personnel files (e.g., which officers are gualified to 


asSign to unit X), but does not need to know about personnel 
Salaries. Thus, there may be one view of the datakase for 
the personnel department and another for the finance 


department. 


"In a sense, a view is just a small conceptual database, 
and it is .at the same level of abstraction as_ the 
conceptual database. However, there are senses in which 
a view can be "more abstract” thal awecechecComld luda ta— 
base, as the data dealt with by a view may be construc- 
table from the ccnceptual database but not actually 
present in that database." [Ref. 9:p. 7] 
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Figure 2.2 illustrates the several components of the 
architecture in more detail. Next, we will examine these 
components. 

Each user has a language at his or her disposal. For the 
application programmer it will be a high-level programming 
danguage, such as PlyI or COBOL; for the terminal user it 
will be either a guery language or ae special-purpose 
janguage supported by an on-line application program to meet 
the user's requirements. Each of those languages is knewn as 
"host language." The term "data sublanguage (DSL)" is a 
subset of the host language that is concerned with database 
objects and operations. In other words, the DSL is embedded 
ina host language. Multiple host languages and multifle 
DSLsS may be supported by a given system. In principle, any 
given data sublanguage is really a combination of two 
languages [Ref. 3:pp. 17-25}, a data definition language 
(OBL), #Whaicheprovides for the definition oredescrirticn of 
database objects, and a data manipulation language (DNL), 
which supports the manipulation or processing of such 
objects. In most systems today the data sublanguage and the 
host are very loosely coupled. Tha tikes, the definitions 
writter in DDL are completely outside the application 
program. 

An external view is the content of the database as it is 
seen Ey some particular user. In general, an external view 
consists of multiple occurrences of multiple types of 
external records [{Ref. 3:pp. 17-25). An external record 
refers to a “logical record" which is not necessarily the 
same as a stored reccrd (see Section E'‘of this Chapter). 

Each external view is defined by means of an external 
schema, which is made up of definitions of each of the 
different types of external records in that view. The tern 
"view" is used for a set of record occurrences and the term 


"schema" is used for the definition of that view. The DDL 
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porticn of the data sublanguage is used to write the 
external schema. That DDL is sometimes called an external 
CDL. 
| The conceptual view 1S a representation of the entire 
information content cf the database in a form that is some- 
what abstract in comparison with the way in which the data 
are physically stored. The conceptual view 1S composed of 
multiple cccurrences of multiple types of concertual 
records. It is more desirable to consider "entities," and 
"relaticnships" instead of dealing in terms of "concerftual 
records." A conceptual record is not the same as either an 
external record or a stored record. For example, the concer- 
tual view may consist of a collection oof branch record 
cccurrences plus a ccecllection of military personnel record 
occurrences plus a ccliection of course record occurrences, 
and so on. The conceptual view is defined by means cf the 
conceptual schema, which includes definitions of each of the 
several types of conceptual records. The conceptual view is 
aview of the total database content, and the concertual 
schema is a definiticn of this view. The conceptual schema 
is written using ancther DDL called conceptual DDL. It is 
intended that the definitions in the conceptual schema 
include pany additional features, such as the authorization 
checks and validation procedures, and these definitions must 
not invelve any considerations of storage structure or 
access strategy. In other words, there must not be any 
reference to stored field representations, physical 
sequence, hash-addressing, indexing, or any other storage/ 
access details. At this level, the situation allows the 
conceftual model to be "data independent" which will be 
discussed in the next Section. 

Seme authorities would suggest that the fundamental 
objective of the conceptual schema is to describe the entire 


enterprise; not just its operational data, but also how that 
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data is used, how the data flows within the enterprise, what 
Gueedeta 1S used» for at each peint, what audit controls are 
to be applied at each point, and So on. 

The internal view or physical level of the architecture 
is a very low-level representation of the entire datakase. 
it consists of multiple occurrences of multiple types of 
Stored records (the ANSI/SPARC refers to this term as 
“internal record"). The internal view is defined by means of 
the internal schema, which not oniy defines the various 
types of stored records but also specifies what indexes 
exist, how stored fields are represented, Whoa Ppl sical 
Seamence the stored records are in, and so on [Bef. 32pp. 
a2. ). “Another data definiticn language (the internal DDL) 
1s used to write the internal schema. It is convenient to 


use the tern "stored database" in place of "internal view," 


ara "stcLrage Structvre definition” im place of “dnternal 
schema." 
Iwo tewels of "apping are Shown in Fig. Zee ae 


concertual/internal mapping describes the corresfondence 
retween the conceptual view (or data model) and the stored 
datakase; it specifies how concertual records and fields map 
into their stored ccunterparts. fe them struecume of the 
stored database is changed, the concepvtual/internal mapping 
must ke changed accordingly, so that the conceptual schema 
may rcéemain invariant. For exaiple, aif a change is made to 
the storage structure definition of tke database, the 
e€ifects of such a change must be contained below the concer- 
jal level somthaty"ddta inderendence" can be accomplished’ 

An external/conceftual mapring describes the correspcon- 
dence between a specific external view and the conceftual 
view. In general, the same kind of differences may exist 
ketween these two levels as may exist between the concerftual 
view and the stored database. For example, records may Le in 


different sequences, fields may have different data types, 
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and so on. Different external views may overlap. That is, 
any number of external views may exist at the same time and 
any number of users Bay share a given external view. Some 
systems allow the definition of one external view tc be 
expressed in terms cf others without always reguiring an 
explicit definition of the mapping to the conceptual level. 
If various “egternal vVilews' are Strictly ~Eclated to ene 
another, this will be a very useful feature of the systen. 

Referring agaln Geer Figew 2227 there still remain three 
components of the architecture: the database managenent 
system (DBMS), the database administrator (DBA), and the 
user interface. The DBMS is the software that handles all 
access to the database. The basic steps that occur in a DEMS 
are the following [Ref. 3:pp. 17-25] : 

1. A user issues an access request, using some partic- 

ular data manipulation language, 

2. the DBMS intercepts the reguest and interprets it, 

3. the DBMS inspects, in turn, the external schema, the 
external/fconcertual maprfing, the conceptual schena, 
the conceptual/internal mapping, and the storage 
structure definition, and 

4. the DBMS performs the necessary operations cn the 
stored database. 

For example, assume that a user wishes to retri€ve a 
particular external record occurrence. in general, the DBMS 
must retrieve all reguired stored record occurrences, 
construct the reguired conceptual record occurrences, and 
then construct the required external record occurrence. At 
each step, data type or other conversions may be necessary. 
Whenever a retrieval request occurs, fields will be required 
from several conceptual record occurrences, and each ccrcep- 
tual reccrd occurrence, in turn, may require fields from 


Several stored record occurrences. 


eae 


The database administrator (DBA), previously discussed 


to some extent, controls the overall database system. We 
will cnly mention the utilities and tools which are required 
to achieve the DBA's tasks. Such utilities would ke an 


essential part of a database system. For instance, loading 
routines, reorganization routines, journaling routines, 
recovery routines, and statistical analysis routines may be 
included as utilities. One of the most important DEA tools 
is the "data dictionary" (not shown in Fig. 2.2). The data 
dictionary is effectively a database in its own right (that 
is, descriptions of cther objects in the system). In partic- 
ular, all the various schemas (external, SONCeP tual, 
internal) are physically stored in both source and okject 
form in the dictionary. A comprehensive dictionary will also 
include cross-reference information, Showing, for examrple, 
which prcegrams use which pieces of data, which departments 
reguire which reports, and so on. It 1S possible to suery 
the dictionary just like any other database, so that the [BA 
can easily discover which programs are likely to be affected 
Ey some change to the systen. 

Awddtas dictionary should help a database user in the 
following ways: [Ref. 7spp. 20-21} 

-Comnmunicating with the other users. 

-Ccentrolling the data elements in a simple and effective 
Manner, that is, introducing new elements into the 
systems, or changing the definitions of the elements. 

-Reducing data redundancy and inconsistency. 

Determining the impact of changes to data elements on 
the total database. 

-Centraiizing the control of the data elements as an aid 
in database design and in expanding the design. 

The user interface, shown in Fig. 2.2, may be defined as 
a boundary in the system below which everything is trans- 
parent (invisible) to the user. Thus, the user interface is 


at tne external level. 
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E. DATA INDEPENDENCE 


The concept of data independence may be easily under- 
stood by first introducing its opposite. Currently, many 
applications are data-dependent. This means that the 
requirements of the application dictate both the way in 
which the data are organized in secondary storage and the 
way in which they are accessed, and, moreover, that knowl- 
edge cf the data structure and access method is built into 
the, applications logic. In this case, the application 
programmer has to knew the data format, the locaticn of 
where the data is stcred, and the access method Sen tells 
how the data is accessed. Changes in any of these items may 
affect the applicaticn program and result in other changes, 
Since the details of these three points may be embedded into 
the application code. It is also likely that as the needs of 
the enterprise change, the fcrmat of the data may change, 
and the data set has to be expanded by adding information 
about different types of entities or additional information 
about existing entities. 

It is said that an application such as above is data- 
dependent because it is impossible to change the storage 
structure (now the data is physically stored) or the access 
method without affecting the application. For example, it 
would not be possible to replace an indexed sequential file 
by a hash-addressed file without making any changes tc the 
application. 

In a database system, there are at least two impertant 
reasons why applications must be data-independent 
| Refs 35) ppc le 

1. Different applications will need different views of 
the same data. 
2. The DBA must have the freedom to change the storage 


structure or access strategy (or both) in response to 


changing reguirements without having Een wiledia 7 
existing applications. For example, the enterprise 
may adopt new standards, application priorities may 
change, new types of storage device may become avail- 
able, and so cn. 

If applications are data-dependent, such changes involve 
corresponding changes to programs, requiring programmers to 
spend an increasing percentage of their time in program 
Maintenance and updating. 

Therefore, it is obvious that the provision of data 
independence is a major objective of database systems. S. 
Atre [Ref. 7:p. 17] defines data independence as 

"The ability to use the database without knowing the 

representation details." 
It can also pe defined as the immunity of applications to 
change in storage structure and access strategy which 
implies that the applications concerned do not depend on any 
one particular storage structure and access strategy. In 
Section D, we have presented an architecture for a database 
system that provides a fundamental principle for achieving 
this cbjective. 

Data independence provides, at acentral location, a 
soluticn tc the problems discussed above. The individual 
application programmer is not required to change the apfpli- 
cation programs to accommodate changes in access methcd or 
locaticn or format of the data. Unfortunately, it is diffi- 
cult to achieve full data independence in a database systen, 
Since a database design defends on the availability of the 
DBMS software packages today, even with the best database 
design. The central location for reflecting changes in the 
storage structure and the access strategy should be anchored 
in the DBMS. The important point here is when, where, why, 


and who should specify the changes to the DBMS, and who 
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should ccntrol these changes? The DBA should ,of course, be 
given these responsibilities. 

The reasons for data independence are summarized as 
follows: 

"1. To allow the DBA to make changes in the content, 
locaticn, representation and organization of a database 
without causing refrogramming of application programs 
which use the datalase. 

2. .To allow the supplier of data POG eee eguip- 
ment and software to introduce new technologies without 
causing reprogramming of the customer's application. 

30mm TO Laecwhetate cae shar nd by allowing the same 
data _ tc appear to ke organized differently for different 
applicaticn programs. 

; YO See roms oe een ea program Beet OP Roe and, 
in particular, to facilitate the development of programs 
for interactive database processing. 

5. To provide the centralization of control needed 
by the DBA to insure the security and integrity cf the 
database." [Ref. 7Jspp. 17-18] 

The levels of abstraction, mentioned in Section D akove, 
from the external view to conceptual to internal view, 
provides two stages of "data independence." In a well- 
designed database system, the internal schema can be modi- 
fied Ly the DBA withcut altering the conceptual schema or 
requiring a redefinition of the external schemas (or 
subschemas). This independence 1s known as physical data 
independence. The advartage of physical data independence is 
that it permits “tuning” of the internal schema for effi- 
ciency while allowing application programs to run as if no 
change had occurred [heft. )92pp. 5-98 

The relationship TFetween external views and the concep- 
tual view also gives a type of independence called logicai 
data independence. Many changes to the conceptual schema can 
ke made without affecting existing external schemas, and 
other changes to the conceptual Schema can be made if the 
externalyconceptual mapping is redefined by the DBA, Again, 
no change to the application programs is necessary 


[Ref£. 9:Epeea—cae 
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In order to use standard terms as much as possible, it 
is essential to give some definitions for a database systen. 
mE stored rield is the smallest named unit of data 
stored in the database. The database, in general 


contains Many _occurrences or instances of @ach o 
several types of stcred fields. 


A stored record is a_Ramed collection of associated 
stored fields. A stored record occurrence or instance 
consists of a grour of related stored field occurrences 
(and representS an association between ba i Invgest 
systems the .stored record occurrence is the unit of 
access to the datakase. 


A stored file is the (named) collection of ail S 


eile 
rences of one type of stored record." [Ref. 3:p. } 


i 
He comerude this Chapter Ly pointing out that the DENS 
can provide independence fron: 
e-underlying representations such as frepresentaticn of 
numeric data, representation of character data, data 
encoding/decoding, and units for numeric data. 
~letem stpueturce™ such as materdialization of eoemruted 
fic ldcweoi@estructume of records and files. 


A. INTRODUCTION 


Designing a datatase is a difficult,complex and tine- 
consuming process. Unfortunately, inadequate datalkases 
result because they cannot satisfy the present or future 
organizational requirements. 

The process of developing a database structure from user 
reguirements is called datakase design. Many database 
designers have argued that there are at least two separate 
steps in the database design process: the design cf a 
logical] database Structure which is processible by the DBMS 
and describes the user's view of the data, and the selection 
of a physical structure that includes data representaticn or 
encoding, access methods, and physical clustering of data. 
Cther than the logical/physical description, however, the 
overall structure of the design process has not been well 
defined, and even the logical/physical boundary has been 
open to considerable dispute. 

General informaticn requirements include a statement of 
the okjectives of the database system, definition of the 
data elements to be included in the database, and a deéscrip- 
tion of data element usage in the users' organizations. 
These requirements are not tied to anv specific application; 
therefore, database structure design based on such reguire- 
ments is considered to be advantageous for long-term data- 
Eases that must be adaptable to changing applications. 

Processing requirements consist of three distinguishable 
components; specific data items reyuired for each applica- 
tion, the data volume (number of data occurrences), and 


processing frequencies in terms of the number of times each 
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application must be run per unit time. DBMS specifications 
and the operating system/hardware configuration are also 
used Ey the designer. 

Performance measures and performance constraints should 
be considered by the designer. Typical constraints include 
upper bcunds on response times to queries, recovery tines 
irom system crashes, or specific data needed to sufport 
certain security or integrity requirements. 

Two major results of the database design process are 
the ccmplete database structure and guidelines for applica- 
tion programmers based on database structure and processing 


reguirements. 


Ee PATAPASE SYSTEB LIFE CYCLE 


The database system life cycle is a convenient and 
useful framework from which to view the database system as 
it evclves over time. This framework provides an ordered 
background to the functions of a database administrator and 
is divided into three separate phases: analysis and design, 
datakase operation, and reorganization. These three phases 
are ccmpesed of the following steps: 

. Analysis and design phase 
1. Requirements formulation and analysis 
Zz. Conceptual deésign 
3. Implementaticn design 
4. Physical design 
- Datakase implementation and operation phase 
1. Database implementation 
2. Cperation and monitoring 


- Reorganization fhase (Modification and adaptation) 
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C. ANALYSIS AND DESIGN PHASE 


A stepwise design methodology for database designer or 
datakase administratcr will be explained in this section. 
The general interconnections between steps are illustrated 
in Figure 3.7 [Refs 5Ssp.. 26 1. 


1. Requirements Formulaticn and Analysis 





Requirements formulaticn and analysis constitute the 
most important step cf the entire database design process, 
Since most subsequent design decisions are based on this 
step. It i1is,however, the most poorly defined and time- 
consuring step of the entire process. 

Contemporary database applications are very Eroad 
and very sophisticated. Many diverse applications may use 
the same integrated database. The design of a database to 
Support all the applications becomes very complex. A design, 
without sufficient information to Support the analysis, will 
not be valid. 

The major task is collecting information content and 
processing requirements from all the identified and poten- 
tial users of the database. Analysis of the requirements 
ensures the consistency of users' objectives as well as the 
consistency of their views of the organization's information 
flow. 

This activity includes the establishment cf organi- 
zational okjectives, derivation of specific database 
reguirements from those objectives or directly from manage- 
ment and nonmanagement personnel, and documentation of those 
requirements in a form that is aygreeable to both end users 
and datakase designers. The technigue used 1S _ fersonal 
interviews with various levels of management and _ key 
employees involved in the processing of data and Services in 


the OLganizatron-) |) RGr-m oO. .e onl 
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Basic Database Design Steps. 


There is a need for ccrporate requirements analysis 
in the requirements formulation and analysis step. Data 
items and their relationships must be defined and conflicts 
are at least recognized, if not resolved, during ccrferate 


requirements analysis. 


Dafferent departments use different names for the 
Same things, and the same names for different things, so 
that a preliminary ccmmon view of data and processes pnust be 
available before later steps can provide reliable results. 
Such a ccmmon view can be derived only in cooperation with 
users. However, this common view will not necessarily 
resemble the final database structure. In conclusion, there 
are two design constraints for this step: 

«accurately modeling real world requirements 


aggregating individual views. 


2. Conceptual Design 


eed Se Se —e 





Conceptual design deals with information independent 
of any actual implementation (i.e. any particular hardware 
cr software system). The main purpose of conceptual design 
is to represent information in a form that is comprehensitle 
to the user independent of system specifics, but imrplemen- 
table on several systems. The result of conceptual design is 
called the conceptual schema because it is a representation 
of the user's “world” view and independent of any DBMS soft- 


ware or hardware considerations. 


This step results in a high-level representaticn of 
diverse users' information requirements such as an entity- 
relationship (E-R) diagram or a Semantic Data Modei (SDM) 
application. | 

In most representation mechanisms, the users 
describe their infcrmation needs in terms of entities, 


attributes, and relationships (E-R diagrams), or in terms of 


40 


records, items, and sets using a DBMS's data descriftion 
danguage (DDL). It is clear that a great deal of gererality 
and potential design optimality are lost when the user is 
restricted to a particular low-level data descriftion 
language instead of a higher-level representation mechanism 
to specify their information requirements. Similarly, the 
goal of the Semantic [Tata Model (SDM) is as follows: 
WOuld Oa lb 1S the dee of a higher-level database model 
that will enable the database esigner to naturally and 
directly incorporate more of the semantics of a database 
into its schema. Such. a .semantics~based database 
description and structuring formalism is intended to 
serve aS a_ natural application Woes mechanism to 
capture and express he structure of the application 
envircnnent in the structure of the datackase." 
fervet.. 10; p. 2.) 

There are two major reasons for a deSigner to use a 
high level of abstraction in the design process. Pires 
entities, attributes, and relationships are not always 
explicitly distinguished and the design decisions are often 


fuzzy~ Second, the problem cf consistency checking would be 


Simplified i1£ a commen, high-level information representa- 
tion for conceptual information structures could be 
develcped. 


AS an example, concertual deSign can be done by 
entity modeling which is the representation and integration 
of user views in terms of entity diagrams. There are four 
bkasic design decisicns required to formulate the entity 
diagrams. 

1. Selection of entities 
2. Selection of ertity attributes 
3. Selection of key attributes for entities 


4. Selection of relationshirs between entities. 
3. iImplementaticn Design 


The major goal of the implementation design step 1s 


to use the results cf the conceptual design step and the 
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processing reguirenents as Lape to create a 
DBMS~-processible schema as output. Refinements to the data- 
base structure that cccur during this design step are devel- 
oped fren the viewpoint of Satisfying DBMS-dependent 
constraints as well as constraints specified in the user 
requirements. 

Implementaticn design contains database structure 
design and design of programs. The database structure is a 
DBMS-processible data definition or schema, usually 
expressed ina data definition language. If there are fhys- 
ical parameters to te selected in a data definition 
language, selection of appropriate characteristics are 
deferred until the rfhysical step. The program design is 
related to the development of structured programs using the 
host language and data manipulation language of the C3MS. 
Conceptual design and implementation design steps are 


together referred to as logical design by some authors. 


4. Physical Design 


Fhysical datatase design is . the process of devel- 
oping an efficient, implementable physical database struc- 
ture from a given logical database structure that has been 
shown to satisfy user information requirements. 

Physicai datakase structure represents stored record 
fOrEnat. access method, and device allocations for a 
multiple-record-type database. 

Major decision classes of physical design are : 

1. Stored record format design. This contains all forms 
of data representation and compression in stored 
records. It also contains record partitioning. Record 
partitioning defines an allocation of individual data 
items to separate physical devices of the same or 


different type, or separate extents on the same 
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device, so that the total cost of accessing data for 
a given set of user applications is minimized. 

Access method design. An access method frovides 
storage and retreival capabilities for data stored 
on physical devices, usually secondary storage. 
Storage structure and search mechanisms are two 
important compenents of an access method. Storage 
structure defines the limits of possible access faths 
through indexes and stored records, and the search 
mechanisms define which paths are to be selected for 
diven applications. A given file may have many asso- 
ciated access faths. Physical databases may reguire 
several primary access paths. Efficiency considera- 
tion of the dominant application descriktes the 
design of individual files. Access time can be 
greatly reduced through secondary indexes, but at the 
expense of increased storage space overhead and index 
maintenance. 

Stored record clustering. The physical allocaticn of 
stored records to physical extents is one of the most 
miportant design decisions. Record clustering 
involves the allocation of records of different tyres 
into physical clusters to take advantage of fhysical 
Seguentiality whenever possible. Analysis of record 
clustering must take access path configurations into 
account to avoid access-time degradation due tc a new 
clacement of records. Clustering also involves block- 
size selection for efficient retreival. Blocks ina 
given clustered extent are influenced Ey stored 
record-size and storage characteristics of the fhys- 


ical devices. 
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IV. DATABASE MODELS 


Aw. INTRCDOUCTION 


A data model is a representation of data and their rela- 
tionshirs which describes ideas about the real world. Data 
models have been used to represent a conceptual view and an 
implementation view of data. Therefore, we will classify 
the data models as fcllows: 

iets ee Ccnceptual data models 
Semantic Data Model (SDM) 
Entity-Relationship (E-R) model 
2. Implementation data models 
Relational data model 
Hierarchical data model 
Network data model 

One of the major responsibilities of the database admin- 
istrator is to develop a conceptual model of the organiza- 
tion. The conceptuai model iS a communications tool between 
the various users of data, and it is developed without any 
concern for physical tepresentation. 

The conceptual model should be independent of a database 
management svstem. The conceptual model has to be mapped to 
the implementation mcdel used as the underlying structure 
for a DBMS. The commercial DEMSS available today are based 
either on a relational data model, hierarchical data ncdel, 
a network data model, Or a combinaticn of then. Lt) as 
important to understand that the DBMS is not a factor in 
designing a conceptual model, but designing an inpleéementa- 
tion model is dependent on the DBMS to be used. 

In reality, the [EMS is frequently given, and the data- 


kase administrator has no choice. The reason for this 
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Situation is that a farticular computer may support only one 
or twe DBMS'. On contrast, the choice cf the DBMS should be 
made after the concertual model is designed. The process of 
Mapping from the ccnceptual model to the implementation 
model should be examined while evaluating different DBMS 
packages. At that time, the DBMS should be a dominant factor 
when selecting the ccmputer. 


Es CCNCEPTUAL DATA MCDELS 
1. Semantic Da 


Contemporary DBMSs are based on database anmecdels 
which have limited capabilities for expressing the meaning 
or a database. These database models do not adequately 
relate a database to its corresponding application environ- 
ment. Therefore, a database model is needed which allcws us 
to capture much more of the meaning of a database. The 
semantic database model is a higher-level database mcdel and 
it is designed to prcvide features for the natural modeling 
of datakase application environments. The semantic data 
model provides a precise documentation and communication 
medium for database users. More details of the semantic data 


model will ke given in Chapter V. 


Zoeeinewentity-Relationship (E=R) Model 


The entity-relationship model is a conceptual data 
model and is based on the view that the real world consists 
of entities and relationships between entities. In this 
nodel, real world objects and their characteristics are 
represented by entities and their attributes. 

An entity is a "thing" which may be distinctly iden- 
tified; examples are records of officers, units, and so 
forth. Individual entities are classified into entity sets- 


that is, collections of entities that may be descrited by 


the same set of properties. Entities with the same attri- 
butes fall into one entity set. All OFFICER records fort the 
officer entity set; all UNIT records form the unit entity 
set. A relationshir set is an association between two or 
more entity sets. The relationship has its own data (€.g., 
date cf assignment, crder number of assignment). | 

Entities and relationships can be represented 
diagrammatically by an entity-relationship (E-R) diagran. 
Each entity set is represented by a rectangular bcx, and 
each relationship set by a diamond-shaped box in this 
diagram. The diamond-shaped boxes (relationship sets) are 
joined to the rectangular boxes (entity sets of entities 
which participate in the relationship). Figure 4.1 presents 
a diagram that shows the relationship of the officer and 
unit ertity sets. 
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Figure 4.1 E-R Diagram for OFPICER/UNIT Relationshifr. 
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Three Tables of Data for the E-R Diagran. 
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C. IMPLEMENTATION DATA MODELS 


Implementation data models are chosen tomer cvide 
constructs that can mecdel a variety of user problems. Most 
commercial database management systems support a single data 
model. It is common to classify these models into three 
Glasses: 

~The relational mcdel 
~The hierarchical model 


~The network model 


The twain difference between the three classes of data 
models lies in the representation of the relaticnships 


ketween the entities. 
1. The Relational Data Mode 


In a relational data mecdel, the entities and their 
relationships are represented with two-dimensional takles. 
Every takle represents a relation and is made up of rows and 
columns. Rows of such tables are generally referred to as 
tuples. Likewise, cclumns are usually referred to as attri- 
butes. Figure 4.3 sShews three relations, one for officers, 


one for units, and tke other for assignments. 
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Figure 4.3 Sample Data in Relational Fora. 
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The relational data model approach is a high-level 
data retrieval and manipulation tool that separates the user 
from the complexity cf storage structures, data structures, 
and access paths. Access paths do not have to be predefined. 
The lack of predefined physical access paths means that 
relational databases must be exhaustively searched to 
satisfy a querry. The advantages of a relational data model 
is its simplicity and its well-developed theoretical 
foundation. 

There are several commercially available DBMS fack- 
ages Eased on the relational model. Some of them are: IBM's 
SOL/DS, and System h, Relational Softwares Inc. *=s CRAGr 
Relational Technology Inc.'s INGRES, BExztton-Lee Unests 
IDM500, Honeywell's MRDS/LINUS, Ashton-Tate's dbase II, and 
National Ccmputer Sharing Services' NOMAD. The relational 
data model will be discussed in greater detail in Chapter 
Rie 


2. Hierarchical Lata Model 


The hierarchical data model 1S made up of a hier- 
arcchy of the entity types involving a dominant (root) entity 
type and one or more subordinate (dependent) entity types at 
the lower levels. The relationship between a dominant and a 
subordinate entity tyre is one-to-many. That is, for a given 
dominant entity there may be many subordinate entity types, 
and tor a yiven dominant entity occurrence, there can be 
many occurrences of a subordinate entity type. 

The relative simplicity and ease of use the hier- 
archical data model and the familiarity of data prccessing 
users with a hierarchy are major advantages of a hierarch- 
ical data model. Disadvantages of a hierarcnical data rodel 
are: 

-The operations of insertion.and deletion are complex. 
-Any subordinate node is accessible only through its 


dcrinant node. 
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some examples of commercially available DBMS fack- 
ages fased on hierarchical data model are IBM's IMS, Intel's 
mero t fe 2000, and Iniormatics ‘WARK IV. 


3. Network Data Model 





The concept of the network data model is based on 
the werk of the COLASYL DETG (Conference On Data Systems 
Languages Database Task Group). The network data todel 
employs the set construct. The term set has a different 
meaning than its mathematical sense. 

The network model of a system is diagrammatically 
represented by a data structure diagram, which was intro- 
duced by C.W. Bachman. In this diayram a rectangle enclecsing 
Gendmee denotes an entity or record type. Each record type is 
composed of data items; but the particular item names are 
not shown in this description, although they are defined in 
the complete database description by the cata definition 
language. In a data structure diagram, a directed arrow 
Gommects two record types. The record type located at the 
tail cf the arrow is called the owner-record type, and the 
record type located at the head is called the memper-reccrd 
type. The arrow directed from owner to member 1s caileda 
Set type and it is given a name. Thus the data structure 
Giagram in Figure 4.4 represents the set type ASSIGNETL-10O. 
Here UNIT is the owner record type, and OFFICER is' the 
member record type. 

The existence of a set type specifies that there are 
associations between records of heterogeneous types in the 
datakase. This allcws the designer to interrelate diverse 
record types and thus to model associatiors between diverse 
entities in the real world. 

There is a distinction between a set type and a set 
Occurrence as well as between record type and reccrd occur- 
rence. For example, SMITH and DAVIS denote two reccrd cccur- 


Bemees Within the reccrd type OFFICER. 
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Figure 4.4 A Network Structure. 


The existance of a set type is declared by naming 
it, stating its owner-record type (only one) and its member- 
record type. A set occurrence iS one occurrence of the 
owner-record type together with zero or more occurrences of 
each membrer-record tyre. This means that there iS an occur- 
rence of a set type whenever there is an occurrence of its 
owner~record type. A set occurrence iS an one-to-many rcela- 
tionship that is the basic building block for relating 
diverse records. The following associations exist among the 
cwhner and member records of any set occurrence: 

~-Given an owner record, it is possible to process the 
related member records of that set occurrence. 

-Given amember record, it is possible to process the 
related owner reccrd of that set occurrence. 

.Given a member record, it iSspossable tosprocess»ctheEr 
member records in the same set occurrence. 

Any implementation that conforms to these three 
rules is a valid inpilementation of the concept of a_ set 
type. Two occurrences of the ASSIGNED-TO set type are shcwn 


in Figure 4.5 
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Figure 4.5 Occurences of Set Type ASSIGNED-TO. 


A set occurrence with no membeéer-record occurrences 
is called an "empty set." A given member record may exist in 
only cne set occurrence of a given type. A member record 
cannot simultaneously belong to two owner records for the 
Same set tyre. 

It is also fossible to implement hierarchies (one- 
to-many relationships) and many-to-many relationships with 
set structures in the network data model. 

The major advantage of the network model is that 
there are successful database management systems that use 
the metwork “data model as the basic structure. The main 
disadvantage of the network model is its complexity. 

There are several commercially available database 
Management system packayes based on this model. Some of them 
dees SMrumzoughis' DNS II, Cpcts DMS-170, Cullinane's IDMS, 
Cianc@s"s TOTAL, Hcneywell's IDS/II, Univac's DMS1100, 
Tigital Equipment Corroration's DBMS-10/20. 
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A. INTRODUCTION 


The semantic database model (SDM) waS introduced by 
Hammer and McLeod [Ref. 10spp. 351-376], and can serve as a 
conceptual database model in the database desSiygn freccess. 
The semantic database model allows the same informaticn to 
be viewed in several ways. 

Each database is a model of some real world environment. 
Tae ccntents of a database are intended to reflect a snarf- 
shot cf the state of this real world environment, and every 
change to the database should reflect an event occuring in 
that environment. Tkerefore, a logical database represents 
selected portions of reality. Eventually, we may ask ques- 
tions like: How do we represent the real world environment? 
What are the structures of the real world environment? Also, 
we may ask the questions about the other aspect of the 
problem, such as: How do we represent the conceptual wecrld? 


What are the structures of the conceptual world? 


1. Structures of Real World Environment 


The first structure is the object. The real world 
has objects; they are phenomena that can be represented by 
NOUNS. AR Chime er ea) Une, an assignment_request are all 
objects. Objects are grouped into object classes by 
performing generalization. Objects are grouped together on 
the kEasis of Similarities. CFFICER is an example of an 
object class. 

Objects have froperties. A property 1s a character- 
istic cf an object. For example, an officer's name and rank 


are properties. Preperties are inherent in objects. the 
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collection of all possible values of a property is called a 
property value set. The property value set for officer rank 
is the collection of all ranks for all possible officer 
objects. 

A fact is an assertion that, for a given object, a 
particular property has a particular item from the pfrorerty 
value set. The statement that the rank of DAVIS is ‘caftain' 
is a fact. A fact is the intersection of a given object with 
a given property value set. 

Objects can ke related to one another. These rela- 
tions are called associations. Associations may exist 
tetween cbjects of the same class or of different classes. 
The association ‘commander' exists between objects of the 
Same class (OFFICER). The association ‘assignment’ exists 
ketween two different classes (between OFFICER and UNIT 
cbject classes.) Also, an object may have an associaticn to 
itself. Associations May have properties just as objects 
have properties. The ‘assignment’ association may have a 
property such as Date_of_assignment. 

A summary of real world structures 1S shown in 


Peo ure 5.1. 


Database designers should define a conceptual struc- 
ture for each of the real world structures. 

An entity is a conceptual representation of an 
object. Entities may be grouped into entity classes. An 
entity class is a representation of an object class. An 
Memumetasseconsists of all the entities that represent the 
cbhjects of an object class. If there is an object class 
Cared Officer, then there can be an entity class called 
CE CER. 


a 


| Structure Definition and Examples 


Okject Phenomena that can be Besontes by 
neouns. An officer, a unit. 


Okject Classes A group of objects formed by general- 
ization. OFFICER, Ulmre 


Properties | aS SO I Rc of objects. Name, 
ank. 


Property value The collection of all possible 
set values of a given property. 
All ranks for Officers. 


Pacer The intersection of a given OPE 
with a_ given proper value set. 
Rank of officer DAViS as Captain. 
Association A connecticn of objects of the sane 
or different classes. | 
TAVIS 1s assigned to unit ALPHA. 


ee 


ee a eee ee en a = epee cet, SS ee oe 


Figure 5.1 Structures in the Real World. 


properties of objects. Attributes describe and characterize 
entities. Rank,Name,Date_of_assignment are examples of 
aktrikutcess. 

The conceptual structure that represents froferty 
value sets is called a domain. A domain is the collection 
cf all values that an attribute can have. The dcmain of 
Names 1s a ocollecticn of character strings of scme afpro- 
priate length. The dcmain of height (in centimeters) is the 
integers from 0 to 250. 

A value is tke representation of a fact. The value 
is the intersection cf a given entity with a given dcmain. A 
tion. Relationships may exist among entities in the same 
Class or in different classes. An entity may have a rela- 


tionship to itself. A relationship may have attributes, just 
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aS associations may have properties. The assignment relation 
May have a property such as Date_of _assignment. 
Associations between real wcrld structures and 


concerftual structures are shown in Figure 5.2 





| Keal World Structure Conceptual Structure 
wr we we wr we ne we ew aes ae wre we ee we ee ee ee ee ee ee ee ee i 
Object Ee 1 & ¥ | 
Object Class Entity Class 
| : Property Attribute | 
| Property value set Domain | 
| Fact Value | 
oa Pae | | 

| Association Relationship 
io { 
oe ee J 





Figure 5.2 Real World and Conceptual Structures. 


B. GENERAL PRINCIPLES OF DESIGNING SDM 


As described in {Ref. 10:p. 355], there are general 
principles of database organizaticn to support the design of 
SDM. These are: 

5 fale A database is to be viewed as_a collection of 
entities that correspond to the actual objects in the 
application environtrent. 


2)- The entities in a. database. are oryanized into 
classes that are meaningful collections of entities. 


_ (3)- _The classes of a database are not in general 
independent, but rather are logically related by means 
of interclass connections. 


(4)- Database entities and, classes have attributes 
that describe their characteristics and relate them tc 
other database entities. An attribute value may be 


derived trom other values in the database. 


5) 


_ (5)-. There are several rimitive ways of defining 
i1nterclass connecticns and derived attributes, corre- 
sponding to the most common types of information redun- 


ee appearing in database applications. These 
facijities integrate multiple ways of viewin the same 
basic information, and rovide uildin blocks fou 
6 eS crane complex attributes an interclass 
relaticnships." 


C. DEFINING ENTITY CLASSES 


The kasic format of an SDM entity class descripticn is 


Shown in Figumeiie. 3 Jeece-mo- Pp. 215 


ENTITY _CLASS_NAME 
[ dESCL1 PUHCOiw eee = =" > - So oe } 


Linterelasssecnnection + .-==>> =e J 


member attrikutes: 


Attribute_name 
f dOSClr aGlaon: ! s=-=>s=--- Tee oe j 
Value Glass >. -2---—-s4cc5 
mandatcry 
multivalued jf no overlap in values ] 
exhausts value class] 
not changeable 
inverse: Attribute_nanme} 
match: Attribdute_namel of ENTITY CLASS 
; _on Attribute_name2} 
PGerivation: -se5---Sseeme j 


{class attributes: 
Attribute_nanme 
[descriftion: ------------------ ] 
Value Glass: =— {ee 
Lden lV atlOn ieee === eee el 


f identifiers: 
Attribute _namel+[Attributemsate2r| 2. |] 


wT iene ER ices) cei Ieee St ey OO eal 


Figure 5.3 Format of SDM Entity Class Descriptions. 
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[ey ng Se ee ee eee | 


PemoUNedatasase | 1S a collection of entities. Entities are 
organized into classes. The structure and organization of an 
SDM database 1S specified by an SDM schema. SDM schema 
identifies the classes in the database. Appendix A is an 
example cf an SDM schema for ‘Personnel Database.' 
Each entity class in an SDM schema has the follewing 
features: 
tae A Class Rane identifies the class. Each class name 
must be unique with respect to all class names used 
1n a schema. OFFICER, UNIT, ASSIGNMENT REQUEST are 
ali class names. 
2- The class has a collection of members (the entities). 
Each class in an SDM schema is a homogeneous ccllec- 


tion of one type of entity. 


3. A textual class description is an optional feature of 
entity class. It describes the meaning and contents 
of the class. 

4. The class has several attributes which describe the 
members of that class or the class as a whole. There 
are two types cf attributes: Member attributes and 
Class attributes. For example, each member of class 
UNIT has attributes Name, Unit_category, Location 
which identify the unit's name, its category, and its 
lccation, respectively. A class attribute descrites a 
rroperty of a class taken as a whole. For example, 
the class ASSIGNMENT_REQUEST has the attribute 
Number_of requests, which identifies the number of 
requests issued in the current year. 

5. An SDM class can either be base or nonbase. A base 
class is one that is defined independently of all 
other classes in the database. In Appendix_A_ the 
class OFFICER is a base class. It exists indepen- 


dently of other classes. If we think of an entity 


3), 


class such as COMMANDERS, it isa subset of the 


OFFICER class, and so can fe derived from this class. 


The class, COMMANDERS is called as an nonbase class, 


and it does not have independent existance. Every 
nenbase class has an entry, interclass connecticn, 
that describes how the class is to be constructed. 

If the class is a base class, it has identifiers. 
These are attributes that uniguely identify memters. 
Fer example, class OFFICER has the unigue identifier, 


Military sip. 


D. DEFINING ATTRIBUTES 


There 1s a collection of attributes in each class 


description. These attributes represent the properties of 


objects. Each attribute has the following features. 


Lic 


An attribute name identifies the attribute. Attribute 
hames must be wnigue within the class where they are 
defined. They must be unigue within all classes that 
are derived from their class of definiticn. 
Date _of promoticn, Main_branch are examples of attri- 
bute nhames. 

The attribute has a value which is either an entity 
in the database (a member of some class) or a collec- 
tion of such entities. The value of an attribute is 
selected from its underlying value class. Value 
class is another term for domain that contains the 
permissible values of the attribute. The value class 
of an attribute may be any class in the schema or may 
be the special value NUIL. se. , ene value.) DATS, 
ERANCHES are examples of value classes. 

The applicability of the attribute is specified by 


indicating that the attribute is either: 


6¢ 


(a) a member attribute, which applies to each member 
of the class, and so has a value for each memkter 
(6006, Hilitary ID of OFFICER), or 


(b) A class attribute, which applies to a class aga 
whole, and has only one value for the class (€.¢., 
Number_of_requests of ASSIGNMENT_REQUEST.) 

A textual attribute description 1s an oftional 
feature that describes tha meaning and purpose of the 
attribute. This serves as an integrated form of 
database documentation. | 

The attribute is specified as either single valued or 
multivalued. A sSingle-valued attribute has one 
value, that is, amember of the value class of the 
attribute. The value of a multivalued attribute is a 
subclass of the value class (E.ge, 
Foreign_language_capability of OFFICER is a nultiva- 
lued attribute.) The default value for this feature 
is single valued. 

An attribute can be specified aS mandatory, which 
means that a full value is not allowed for it. For 
Saige, attrEioute Military [TD of OFFICER is speci- 
fied as "mandatory"; this models the fact that every 
O@eereern has aa, Military ID. 

An attribute can be specified as not changeable, 
which means that once set to a nonnull value, this 
value cannot be aitered except to correct an error. 
For example, attribute Military_ID of OFFICER is 
specified as "not changeable." 


A member attritute can be reguired to be exhaustive 





of its value class. This means that every member of 
the value class of the attribute is used. 

Finally, multivalued attributes can be specified as 
nonoverlapping. This means that a member of the 


value class can be used at most once. 
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E. MEMBER ATTRIBUTE INTERRELATIONSHIPS 


The semantic data modei frovides three facilities for 
defining interrelationships among member attributes. these 


facilities are inversion, matching, and derivation. 
1. Inversion 


The inverse facility causes two entities tc be 
contained within each other. Member attribute A1l of class Cl 
can be specified as the inverse of member attribute A2 cf C2 
which means that the value of Al for amember M1 of Cl 
cofsists of those members of C2 whose value of A2 is M1. 

Inverses are always specified Fy a pair of attri- 
EutesS which establishes a binary association between the 
members cf the classes. For example, in Appendix_A the 
entity classes OFFICER and UNIT are inverses of each other. 
In OFFICER, the attribute Unit_assigned has the value class 
UNIT, and the inverse attribute Officer_assigned. In UNIT, 
the attribute Officer_assigned has the value class OFFICER, 


and the inverse attribute Unit_assigned. 
2. | Naa cinanine 


The second SIM facility for representing relaticn- 
Ships 1S Matching. hKith matching, a member of one entity 
class is matched with a member of another entity class. The 
value of the match attribute A1l for the member M1 of ciass 
C1 is determined as fcllows. 
1. A member M2 of some class C2 is found that has M1 as 
its value of gember attribute A2. 
2. The value of member attribute A3 for M2 is used as 
the value of Al for M1. 
For a multivalued attribute (call it Al), it is 
permissiktle for each member of C1 to match to several 


members of C2; in this case, the collection of A3 values is 


the value of attribute A1l. In other words, the value cf an 
attribute in one of the members is moved to the other. 

Inversion and matching provide multiple ways of 
viewing n-ary associations among entities. Matching suprorts 
kinary and higher degree associations, while inversion 
allows the specification of binary associations. 

For example, a matching specification in Appendix A 
indicates that the value of the attribute 
Foreign_language_capability of a member O of class OFFICER 
is equal to the value of attribute Foreign_language of the 
member r of class FOREIGN_LANGUAGE whose FID value is OQ. 


3. Derivation 


SDM provides the ability to define an attribute 
whose value is calculated from other information in the 
database. Such an attribute is named derived. 

The approach is to provide a small vocabulary of 
high-level attribute derivation primitives that directly 
model the most common types of derived information. Each of 
these primitives provides a way of specifying one method of 


computing a derived attribute. 


F. CIASS ATTRIBUTE INTERRELATIONSHIPS 


Attribute derivation primitives for member attrikutes 
can be used to define derived class attributes, as these 
primitives derive attribute values from those cf cther 
attrikutes. Of course, instead of deriving the value of a 
member attribute from the value of other member attributes, 
the class attribute [frigitives will derive the value ofa 
Class attribute from the value of other class attributes. 
Moreover, there are two other primitives that can be used in 


the definition of derived class attributes: 


Gre 


An attribute can be defined so that its value equals 
the number of members in the class it modifies. Fer 
example, attritute Number_of_reguests is derived from 
ASSIGNMENT REQUEST record by Summation of members as 
specified. 

An attribute can be defined whose value is a function 
of a numeric member attribute of a class; the func- 
tions available are "maximum", "Dinimun", "average", 
and "sum" taken over a member attribute. The ccnEeue 
tation of the function is made over the members of 


the class. 
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VI. RELATIONAL DATABASE MODEL 


A. RELATIONAL DATA STRUCTURE 


In order to explain the relational data structure, it 
will be very helpful to use the sample data in relational 
form. Figure 6.1 reflects a relational view of the data 
which is organized into three tables: OFFICER (officers who 
are in the army), CCURSES (all courses which are offered), 
and CCURSE_ATTENDED (officers who took some courses). The 
OFFICER table contains, for each officer, a military identi- 
ficaticn number, rank, name, and the city where the officer 
Was born; the COURSES table contains, for each course, a 
course ccde, course name, brief description of that course, 
duration, and location where the course is offered; and the 
COURSE_ATTENDED table contains, for each grade, a military 
identification number, a course code, and a grade taken. The 
following assumptions regarding officers, courses, and 
course attended are made. Each officer has a unigue military 
ID number, exactly cne rank, hame, and city name. Each 
course has a unigue course code, exactly one course name, 
description of the ccurse, duration, and location. At any 
given time, no more than one grade exists for a given 


officer/course combination. 


Assume that we are given a coilection of sets El, 
Op eee ,50 (they are not necessarily distinct), Risa 
relaticn on those n sets if it is a set of ordered n-tuples 
<e1, €2, --- , en> such that e1 belongs to El, e2 belongs to 
Pees, en belongs to En. Sets wou, B25 <6 , En are the 


domains of R. The value nis the degree of R. It is 


65 























OFFICER | 

(a2 | EAE) MAY W/ EH T| copuseanmmoe 

Ir1_| Maj__| Smith [Berkeley | =2E2-| SCORE.) _2RADE| | 

ciZ Capt James |Newyork BE) ian Ci | ee 

| |iD3 | Lt | Bichard|monterey| | 1p1 | c2 | a 4 | 

| GS Se EO eae | 

ipz C1) | Ae { | 

| rn ee ee 

ae || “aes | 

“w2 | ca | c+ | | 

bps beri | 

COURSES oS | 

(SCODE|_Tz2LE 1) CEScRIPT _______|_DURL|_Eocazion | | 

5 RODE awa | DOR eta Precl a aeaeecoe 

C2 Cobol Copol Prag. tanga 8 Monterey 

| | C3 | Digelect| Digital Machines | 12 | Berkeley | | 

| i “ch | wepsys | Weapon Systems | 114 | Monterey | 
(b) 

= | 


Figure 6.1 Sample Data in Relational Form. 


sometimes called arity n [Ref. 3:pp. 83-93], {kEet. Seep. 
Wi ees) 

From the mathematical set-theory perspective, we can 
give another egquivalent definition of a relation that is 
sometimes useful. A relation is any subset of the Cartesian 
product cf one or more domains. For example, if we have n 
sets, say n=2, El={a,t}, and E2={0,1,2}, then E'’ x EZ is tite 
Cartesian product of these n sets. That is, it is the set of 
all pessible ordered n-tuples <el,e2> such that e1 pelongs 
tor Ea e2 belongs to E2. The result of this Cartesian 
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Bieeemct OfSEl xeE2 is {(a,0) 7 (ae 1) 7 (a. 2)ue (D709) gelDoel) olde 2) - 
Figure 6.2 , for example, shows the Cartesian product of two 
eeu) MEDBana CCODE (Military 2D No., and Course Code). 





MID. VWEeoOdDE 
MID: COURSE: [==55- ===== 
| LD vi Cl 
4 tp1 [72 | | 
X ecm | \- Baa | 
| ID1 (Ces | | 
| EDZ | C1 ! | 
| | ID2 C2 | 
j ID2 j C37) 


Ce ee - 


Figure 6.2 An Example of a Cartesian Product. 


A relation called COURSES, of degree 5 is illus- 
trated in Figure 6.1 (b). The five domains are sets of 
values representing, respectively, course codes (CCCDE), 
course tities (TITLE), brief description of each course 
(CDESCRIFT), duraticn for each course, and locations where 
courses are offered. The “course title" domain, for example, 
1s the set of all valid course titles; note that there may 
be sone titles included in this domain that do not actually 
appear in the COURSES relation at this particular moment. 

It is convenient to view a relation asa table, 
where each row is a tuple and each column corresponds to one 
component. The columns are often given names, called attri- 
Eutes. The number cf tuples ina relation is called the 
cardinality of that relation; e.g., the cardinality of the 
COURSES relation is four, and it has five attributes (or 


columns). AS mentioned earlier, a domain can be thought as a 
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pool cf values from which the actual values for a given 
attrikute are drawn. It 1S very important to note that the 
domains of a relation do have an ordering defined among 
them. If we have a tuple (a1,a2, .~... ,an) with n components, 
the value of the j th component in this n-tupie has to be 
drawn from the j th demain. In Figure 6.1 (b), (C2, Cobol, 
Cobol Prog. Language, 8, Monterey) is the second tuple of 
the CCURSES relation, and the value of the fourth ccmpenent 
of this tuple under the attribute named DURATION is drawn 
fron the fourth domain which is a set of positive intecsers, 
ranging from 0 to 99S. Mathematically speaking, the rear- 
rangement of the five columns of the COURSES relation into 
some different order results a different relation. 

It 1s important to note the difference between a 
domain and attributes which are drawn from that domain. An 
attritute represents the use of a domain within a relation. 
Figure 6.3 shows a part of a relational schema in which four 
domains (MILITARY ID, MILITARY_RANK, OFFICER NAME, and 
LOCATICN) and one relation (OFFICER) have been defined by 
using a data definiticn language {Ref. 3:pp. 83-93]. the 
relation is declared with four attributes (MID, RANK, NAME, 
and CITY), and each attribute is specified as being drawn 
irem a corresponding domain. It iS sometimes possitle that 
the dcmains of more than one attribute can be the same. in 
cther words, those attributes can use the same domain in 
common. To differentiate between attributes that have the 
same dcmain, each if given a unigue attribute name. A 
crucial feature of relational data structure is that associ- 
ations retween tuples are represented solely by data values 
in attrikutes (columns) drawn from a common domain. 

All relations in a relational database are required 
to satisfy the following condition. 

"Every value in the relation (i.e., each attribute value 


in €ach Se 1S atomic (1.e. nondecomposable so far 
as the system is ccncerned)." (Ref. SD. oO 


oc 


That 1s, at every row-and-column position in the takle there 
always exists precisely one value, never a set of values. 
But in the case of having "unknown" or "inapplicable" 
values, null values can be allowed to represent these 
special values ina relation. This is the idea of normaliza- 
tion. If a relation satisfies the above condition, itis 


said to be normalized. This idea will be discussed in detail 


later. 
| es 
. TCMAIN MILITARY ID GHARAC TER) | 
| DOMAIN MILITARY_FANK CHARACTER (4) | 
DOMAIN OFFICER NAME CHARACTER (20) | 
| CCMAIN LOCATION CHARACTER (15) | 
| RelAltON OFFICER | 

(MID : DOMAZN MALITARY ID, 

RANK ; DOMAIN MILITARY_RANK, 
| NAME ; DOMAIN OFFICER NAME, 
| Guay: : DOMAIN LOCATION) | 
| 


0 ee ee 


: 


Figure 6.3 Domains and Attributes. 


The generalized format or notation which is used to 
represent a relation is called the relation structure. for 
example, OFFICER (Mid, Rank, Name, City) is the structure of 
the OFFICER relation. In general, Relation_name (attri-. 
butel, attribute2, ... ,attributeN) is the general format to 
show the structure of a relaticn. IZ we add constraints on 
allowable data values to the relation structure, we then 


have a relational schema [Ref. 11]. 
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2. Keys 


It is frequently the case that within a given rela- 
tion there is one attribute with values that are unique 
within the relation and thus can be used to identify the 
tuples of that relaticn. Attribute MID of the OFFICER reia- 
tion, for example, has this property. Each OFFICER turfle 
contains a distinct MID value, and this value may be used to 
distinguish that tuple from all others in the relaticn. MID 
is called the primary key for OFFICER. 

A single attribute may not always be the primary key 
in’a relation. However, the values of more than one attri- 
bute together may constitute a unigue identifier. Thus, scme 
combination of attrikutes, when taken together, have the 
unigue identificaticn property. In the relation 
COURSE_ATITENDED (Fig. 6.1), for example, )the Combination 
(MID, CCODE) aas this property. The existence of sucha 
combination is guaranteed by the fact that a relation isa 
set. Since sets do not contain duplicate elements, each 
tuple of a given relation iS unigue with respect to that 
relation, and hence at least the combination of all attri- 
butes has the unigue identification property. In the above 
example, the combination (MID, CCODE) is said to bea 
composite key as well as a frimary key for the OFFICER 
relation. 

On the other hand, occasionally we may encounter a 
relation in which there is more than one attribute comrina- 
tion having the unigue identification property and hence 
more than one candidate key. In such a case we may arbi- 
trarily choose one of the candidates as the primary key for 
the relation. If a candidate key is not the primary key, it 
is called an alternate key [Ref. 3:pp. 83-93]. The COURSES 
relaticn in Fig. 6.1 (b) “is SUch a E@Wation. Each course 


has a wnigue course code and a unigue course name (TITLE). 


70 


If the designer chocses one of these candidate keys, say 
CCODE, as the primary key for the relation, TITLE will ke an 
alternate key. 
The primary key is a unigue identifier for tuples in 
a relation. Those tuples represent entities in the real 
world, and the primary key really serves as a unigue identi- 
fier fcr those entities. For example, the tuples in the 
CFFICER relation represent individual officers, and values 
of the MID attribute actually identify those officers, not 
just the tuples that represent then. AS a result of this 
interpretation, we can aow introduce the following rules 
"Integrity Rule 1 (Entity integrity) 
No component of a primary key value may be null." 
bets osp. oY | 
According to the definition, all entities must have 
a unigue identificaticn of some kind. That is, they rust be 
distinguishable from each other. Primary keys perform the 
unigue identification function in a relational database. If 
tepemaeyekecy Value is null in*a*’ relation, this implies that 
there is some entity that does not have a unique identifica- 
tion. In cther words, it is not distinguishable frem cther 
entities. It is stxrcengly reccmmended that both wholly and 
Pabtiialtivenuil adentifiers be fprohibited. 
Those types cf arguments lead us to a second integ- 
Paint ream Ooccaslondally one felation includes references to 
another. Reiation COURSE_ATTENDED, for example, includes 
references to both the OFFICER relation and the COURSES 
relation, via its MIIT and cCCCDE attrikutes. It is clearly 
seen that if an occurrence or a tuple of COURSE_ATTENCDED 
contains a value for MID, say ID2, then a tuple for officer 
ID2 should exist in OFFICER. Otherwise, the COURSE_ATTENDED 
tuple would refer to an nonexistent officer; and similarly 


for ccurses. To make these notions clear, we snould under- 
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"A given (ona ey optionally be designated as s prisary 

if “ands ont y there exists. some aaa hl 
primary key Sane on that domain. ™ “en.” 3: p. 

For example, we may designate the domain MILITARY_ID as 

primary, by extending its definition shown in Fig. 6.3 as 


follows: 
DOMAIN MILITARY _ID CHARACTER (9) PRIMARY 


Any relation which contains an attribute that is 
defined on a primary domain (for example, relation 
COURSE_AITENDED) must obey the following rule. 


ra ea oo! Rule 2 (Referential integrity) 

Let D be a primary domain nd let RI be a relation with 
an attribute A that is defined on D. Then, at any given 
time, e€ach value of A in R1 must be either (a) i. Os 
(b) ein to _V, say, where V is the Wee key value of 


some tuple in some relation R2 (RI and not neces- 
Sab iy, Gale) With primary key defined on 0." 
[LRet.. 3: peo 


Here, relation R2 must exist because of the defini- 
tion cf primary domain, and if attribute A is the primary 
key of K1, the rule is trivially satisfied. When an -attri- 
bute such as A in one relation is a key of another relation, 
the attribute is called a foreign key. For example, attri- 
bute CCODE of relation COURSESAITENDE=D 1s avtoremqn <cye 
because its values are values of the primary key of the 
COURSES relation. 


3. Extentions and Intentions 


An extention andan intention are actually ccmfpo- 
nents of a relation in a relational database. 

The set of tuples existing ina given frelaticn at 
any given instant is known as the extention of that rela- 
tion. Thus the extention changes with tine. That. 2s 7 
varies depending upcn the several operations performed on 


tuples which are added, deleted, and updated. 
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The intention of a given relation is the permanent 
part of the relation. It is independent of time. The inten- 
tion corresponds to what is declared in the relational 
schema. Hence, the intention is the combination of the rela- 
tion structure (sometimes called the naming structure) 
mentioned earlier and the integrity constraints which can be 
subdivided into key constraints, referential constraints, 
and other constraints. Key constraints are constraints 
implied ky the existence of candidate keys. The primary key 
specification and the alternate keys specifications included 
by- the intension imply a uniqueness constraint (by the defi- 
foeeeen Of Candidate key) = and a mno-nuli?s constraint (by 
Tategrity Rule 1) respectively. Referential constraints are 
constraints implied Ey the existence of foreign keys. A 
specification of all foreign keys in the relation implies a 
referential constraint (by Integrity Rule 2). The relations 
in Figure 6.1 are examples of extentions and they alsc show 
the intentional relation (or lamin g) structure which 
consists of the relation name plus the names of the attri- 
butes. The operational data appearing under those attritutes 


are the extention part of those tables. 


B. RELATIONAL ALGEBRA 


Relational algekra is a collection of operations on 
relations. Fach operation takes one or more relations as its 
operand(s) and produces another relation as its result. A := 
E+ C;, for example, is an arithmetic expression in PASCAL 
Programuing Language. B and cC are operands known as vari- 
ables for the additicn operator {+). After performing this 
operation, the result will be assigned into the variable A. 
Likewise we encounter B and C as two relations and plus sign 
(+) as union operator in the relational algebra. After this 
operaticn is performed, A will be a new relation produced by 


that cperation as its output or result. 


Us 


The relational alcebkra basically consists of twec groups 
of oferators: the set operators union, difference, intersec- 
tion, and product; and the special relational operators 
selection, projection, join, and division. These operaticns 
are very important in order to understand the other high- 
level relational languages such as SQL, QBE which will be 


discussed later in this Chapter. 


1. Set Operators 





The traditioral set operators are union, difference, 
intersection, and CarteSian product. The two relations used 
as orerands must te union-compatible for all except 
Cartesian product. This means that each relation must have 
the same number of attributes (Same degree), and the attri- 
kutes in corresponding columns must come from the same 
domain (the names of the attrikutes need not be the same). 
[Ref. 3:pp. 203-215], [ Ref. 6spp. 242-282 } 

-Union 

The union of two relations is formed by combining 
the tuples from one relation with those of a second relation 
to produce a third. In other words, the union of two rela- 
tions A and B, A UNION B, is the set of all tuples t 
Eelonging to either A or B (or both). Duplicate tuples are 
eliminated. For example, let A be the set of officer tuples 
for officers stationed in Monterey, and B the set of officer 
tuples for officers who took course C2. Then A UNION 8 is 
the set of orficer tuples for officers who either are 
Staticned in Monterey or took course C2 (or both). 

-Difference 

the difference of two relations is a third relation 
containing tuples which occur in the first relation but not 
in the second. That is, the difference between twe (unicn- 
compatible) relations A and B, A MINUS B, is the set of all 
tuples t belonging to A and not to B. For example, let A and 
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B be the same sets as in the example under "Union". Then A 
HINUS B is the set cf officer tuples for officers whe are 
staticned in Monterey and who did not take course C2. 
-~Intersection 

The intersection of two relations is a third rela- 
tion containing common tuples. Again, the relations must be 
union-ccmpatible. Mathematically speaking, the intersection 
of twce relations A and B, A INTERSECT B, is the set of all 
tuples t beionging to both A and B. Let A and B again, for 
example, be as in the example under "Union" above. Then A 
INTERSECT B is the set of officer tuples for officers who 
are stationed in Mcnterey and took course C2. 

-~Cartesian product 

The Cartesian product of two relations is the 
concatenation of every tuple of one relation with every 
tuple of a second relation. Let A and B be two relations. 
Then A TIMES B or A xX B is the set of all tuples t such that 
t is the concatenatico of a tuple "a" belonging to A anda 
tuple "rk" belonging to B. The concatenation of a tuple a = 
ciemeepan) and a tuple b= (bi,...,5N), in that order, is 
miemcupte t — {a1,...,al,bu+t,...,bDM+N). For example, let A 
Be the set of all officers! military identification numters, 
and B the set of all course code numbers. Then A TIMES 8B is 
the set of all pessible military_ID_ number/course_code 


pairs. 


2. special Relaticnal Operations 


-Pro jection 
Frojection iS an operation that selects specified 
attributes from a given relation. The result of ithe 
rrojecticn is a new reiation having the selected attributes. 
In other words, the projection operator creates a "vertical" 
subset of a given relation obtained by selecting specified 


ateEEibutes, in a specified left-to-right order, and then 
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eliminating duplicate tuples within the attributes selected. 
Projection can also te used te change the order of attri- 
butes in a relation. FOr example, consider the 
COURSE_ATTENDED relation in Figure 6.1 (c). The prcjection 
of COURSE_ATTENDED cn Ccode and Grade attributes, denoted 
with brackets as COURSE_ATTENDED {Ccode, Grade}, is shown in 
Figure 6.4 Note that alithough COURSE_ATTENDED has eight 
tuples to begin with, the projection COURSE ATTENDED {Ccode, 
Grade} has only six. Two tuples were eliminated Lecause the 
tuple {[{C1, A-} and {C3,- A-} occurred twice (after the 
projection was done). Another example of reordering the 
attributes within the OFFICER relation is to write a state- 
ment for projection such as OFFICER {City, Name, Rank, Mid}. 


—— | 

| CODE ali oC RAD eae 
| ai | 
| ee | 
| ana | 
| C4 | Gt 
| Sg SS SSS SS SO es SS Se 
a | 


Figure 6.4 Projection of COURSE ATTENDED Relation. 


-Selection 
The selection operator yields a "horizontal" subset 
(rows) of a giver relation. in other evonds?) celecticwaen. 
tifies tuples to be included in the new relation. Selection 
is denoted by specifying the relation name, followed by the 
keyword WHERE, followed by a conditional statement inveclving 
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attrikutes. The condition is a single or a combinaticn of 
Boolean expressSion(s). Figure 6.5 (a) shows the selecticn of 
the relation COURSES WHERE LOCATION = 'MONTEREY'. Figure 6.5 
(6) shows the selection of COURSES WHERE DURATION > 12. 
Figure 6.5 (c) shows the selection of COURSES WHERE DURATION 
> 12 AND LOCATION = 'INDIANAPCLIS'. 








| |Se225) -ZEZEE. _CDESCRIPT _______..|_DUR:|_LOCATION | | 
| 2 cz Cobol Cotols ProgGasiang.» | 8 Uonterey 
| C& | WNepsys| Weapon Systems / 114 {| Monterey | 
(a) | 

fee eee ee eet ae me ee 
| Ca Adp Auto. Data peosce==t 16 In.polis 
| |} C4 | Fepsys] Weapon Systems ; 114 Monterey | 
5 | 

cee we ee es ee ee ee ee we oe ee ee ee we es re ee ee we we wn ew we we we we a en we er we we ew ww wr ee oe = | 
(CCODE| TITLE |CDESCRIPT | DUR} LOCATICN | | 

1 C1 J] Adp [Autc. Data Process. | 16 | In.polis | | 

| 
! (Cc) 


Figure 6.5 Selection of COURSES Relation. 


=o Can 
The join operation is a combination of the product, 
selection, and ({possiktly) projection operations. The jcin of 
two relations, say A and B, is denoted as A JOIN B which is 
equivalent to taking the Cartesian product of A and _ E and 


then performing a suitable selection on that product. If 
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necessaly, duplicate attributes can be eliminated by 
projection. The Join operation iS a binary operation since 
it operates on two relations but selection and projection 
are oferations on Single relations (i.e., they are unary 
operations). 

Actually there are many possible join operations in 
which the "joining condition" is based on equality or 
inequality between values in the common column of two rela- 
Taro Tess Those operations are usually called egquijoin, 
greater-than join, less-than join, and natural join. A 
natural join iS an e€guijoin with the elimination of dupli- 
cate cclumns, and if a common relational operation. for 
example, the egquijoin and the greater-than join can prcduce 


the same result as the expressions 


(A TIMES B) WEERE A.X = B.Y 
(A TIMES B) WHERE A.X > B.Y 


where A andsB are relations, and X and Y are attritutes 
belong to A and B, respectively. The values of attributes X 
and Y must ke derived from some common domain. Consider the 
OFFICER and COURSES relations shown in Figure 6.1 (a) and 
(b) - Tables OFFICER and COURSES may be joined over their 
CITY and LOCATION attributes; the result is shown in Figure 


6.6 We denote such a join as 


(OFFICER JOIN COURSES) RO OF Pie eee Chrys. = 
CCURSES. LOCATION. 


The jcin in Figure 6.6 1S an egqumjolaeee it erie duipgiacan. 
attributes (CITY and IOCATION) were eliminated, then the new 
relaticn would be created as a result of the natural jcin 
operation. 
-Division 
The division cperation has a binary relation R (X,Y) 


as the dividend and a divisor that includes Y. The result is 
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ed ee Smith|Ber.| C3 Digel Dig. Mach. “12 | Berkeley 
= Fr Rueh, hon. | -C2 Cobol] Cob. Prog. 8j| Monterey 
| a pe heen. | 1 O De .Cz Covpedi{Cop. Prog.) TZ iNentereyy 


Figure 6.6 Join of OFFICER and COURSES over CITY and 10C.. 


a set, S, o£ values of X such that x belongs to S if there 
is a turfle (x,y) in R for each y value in the divisor. 
[Ref. ispp. 15-48] 


i cece ee paces natn OS eae axaepa Ai aa aap Cait getines SD seat OE peaegaes Sen emma 


COUESE: COURSE_LOCATION: CODES; 

[| CCODE|LOCATION | | LOCATION] i CCOLE a 
C1 {Ind.Pol. Monterey | C24 
G2 Monterey Berkeley 


C3 | Berkeley 


C4 jMonterey 


i; C2 | Berkeley 


ines fee 


SS A> i en ee ene eee eee ee ene eee eee eee eee eS 


Figure 6.7 The DIVISION Operation. 


Figure 6.7 illustrates this operation. If relation 
COURSE is the dividend and relation COURSE_LOCATION is the 
Givelsor, then CODES = COURSE/COURSE LOCATION. In the Figure, 
C2 is the only course code for which there is a tufle with 
Monterey and Berkeley (i.e., <C2, Monterey> and <C2, 
Berkeley>) in COURSE relation. The other course codes, Cli, 


Gepmmandecl ao not satisfy this condition. .- 
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C. DATA SUELANGUAGES FOR RELATIONAL DATABASES 


The relational database model must provide languages to 
access relations. A number of relational data suklanguages 
(DSLS) have been proposed and developed. Because of the 
tapular structure of relations, users can easily understand 
relational DSLs. Ancther important feature of a relational 
CSL is its selective rower. Relational DSIs should have the 
capability to retrieve data that satisfy any condition over 
any numker of relaticns. [Ref. l:p. 36] 

Early relational languages were based on selective 
fower. Codd [Ref. 12] gave the definition of the relational 
model in 1970 and defined the basis for relational languages 
such as relational algebra and relational calcula. 
Relational calculus has particular significance. It isa 
form of predicate calculus specifically tailored to the 
relational databases andis used to measure the selective 
power of relational languages. A relational language is 
relationally complete if it can produce any data that can be 
obtained from a relational calculus expression. fRet. isp. 
56 | 

Data Sutlanguage ALPHA, which is based on relational 
calculus, was presented by Codd [Ref. 12]. DSL ALPHA itself 


waS never implemented, but a language very Similar to it, 
called QUEL, was used as the query language in the rela- 
tional CEMS, called INGRES [Ref. 13]. We will discuss 


INGRES in more detail in Chapter VIII. 

Another widely used Data Sublanguage is Structured Cuery 
Language (SQL) which is used for and is currently inrple- 
mented by the System R relational database management system 
that runs on the IBM Systemn/370 [Ref. 14]. SQL rrovides 
retrieval functions and a full range of update oferaticns, 
and also many other facilities. It can be used both frem an 


on-line terminal and, in the form of “embedded SQL," from an 
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application progran, Daten Or On-line, written in either 
COBOL or PL/I. f{Ref. Szppe) 145-156] The basic format, for 
example, of SQL is in the forn 


Seneca <attErizente> 
FROM <relaticn> 


WHERE <conditional expression> 


Query-by Example (QBE) is also an another relational 
system designed for users who are not programmers. It has 
approximately the same selective power as SQL but uses a 
graphical interface. It is therefore suitable only for 
terminal use and cannot be embedded in a host language. 
Ret. Ispp. 181-200] Query-by Example is an artificial, 
self-contained, user-directed specification language, 

So far we have examined several aspects of database 
systems in general and relational dataktase model in partic- 
ular. But we have net yet answered the following question: 
After having a body of data to be represented in a datakase, 
how do we decide what relations are needed and what their 
attrikutes should be? This is the database design proclen 


which will be discussed in the next Chapter. 
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VII. RELATIONAL DATABASE DESIGN 


A. INTRODUCTION 


In designing relational databases, the primary goal is 
to ensure tnat relaticns represent the original data speci- 
ficaticns correctly and without redundancy. The major 
concert for the relational database design is the normaliza-~ 
tion jenocese. that is, the process of grouping the data 
elements into relaticns representing entities and their 
relationships. The idea of normalization is based on the 
observation that a certain set of relations has better frop- 
erties fcllowing the database operations, such as inserting, 
updating, and deleting, than do other sets of relaticns 
containing the same data. In cther words, the objective of 
normalization is to froduce a database design that can be 
Manipulated in a powerful way with a simple collection of 
operations while minimizing update anomalies and data incon- 
Sistencies [Ref. 15spp. 99-126}. Normalization theory is a 
useful aid in the database design process, but it is not an 


€xact solution. 


Be NCRMAL FORMS 


Normalization thecry is traditionally expressed through 
a set ci so-called ncrmal forms that progressively constrain 
the structure and contents of a relation. A relation is said 
to be in a particular normal form if it satisfies a certain 
specified set of constraints. 

There are numerous normal forms which have been defined 
ky the relational theorists. As shown in Figure 7.1 
[Ref. 3:p2 239] eack of these normal forms contains’ the 


other. If a relation, for example, is in third ncrmal forn 


82 


eur}; then it is automatically in first and second nerusal 

forms. Nene of these normal forms will eliminate all anoma- 
lies; each normal form would eliminate just certain anoma- 
lies. But R. Fagin [Ref. 11] defined a new ncrmal forn 
called domain/key normal form {DK/NF), and he showed that a 
relation in DK/NF is free of all modification anomalies, 
Legardiess of their type. The point is to find ways tec rut 
relations in DK/NF. If the database designer does this, then 
he 1S guaranteed that those relations will have nc anonma- 
lies. Unfortunately, it is net even known if all relations 
can be put into DK/NF. At this point we need the ccncerft of 
functional dependency to define these relaticnal normal 
forms. 


1. Functional Dependency 


Functional dependency (FD) is aterm derived fron 
mathematical theory; it relates the dependence of values of 
one attribute or set of attributes on those of another 
attribute or set of attributes. Formally, an attribute (or 
set of attributes), Y, ina relation is said to be function- 
ally dependent on ancther attribute (or set of attrikutes), 
X, if knowing the value of X is sufficient to determine the 
value of Y. To put it another way, there is only one value 
of Y associated with any value of X. The notation X-->Y is 
often used to denote that Y is functionally dependent on X, 
and is read: X functicnally determines Y. The attribute (or 
set of attributes) X is known as the determinant of the FD 
%-->Y. It is obvious that the nonkey attributes of any rela- 
tion are functionally dependent on the key. 

To illustrate the -Easic principles of functional 
dependencies, consider the sample database in Figure 6.1. 
The attribute TITLE in relation COURSES is functionally 
dependent on CCODE because each course haS one given title 


value. Thus once a ccurse code is known, a unique value of 
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Figure 7.1 Relaticnal Normal Forms. 


course title is immediately determined. The FD for this 
éxanple is Shown as CCODE =—> Tim: 

Likewise, ir relation COURSE ATTENDED, once values 
for officer ID (MID) and CCODE are known, a unigue value of 
GRADE for that officer in that course is determined. This FD 
is defined as MID,CCOLE --> GRADE. 

It 1S convenient to represent the FDS in a given set 
of relations by means of a functional dependency diagram, an 
example of which is shown in Figure 7.2 It is also fossible 
to have two attributes that are functionally dependent on 
each other. In this case both CCODE ==> TITLE and 2i0le ——~ 
CCODE hold (because TITLE 1S an alternate key for the 
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metation COURSES). The notation CCODE <---> TITLE is 


commonly used to illustrate such mutual FD. 


SN ee 
[eae 


CEFICER: COURSE ATTENDED: | 
{ 
a. >| RANK | | | 
| | MID | | = 
manana se Hee ---->] GRADE } 
ee eis o>|) NAME | eee) -------- aa aM 
aa ee ee | CCODE | 
! |. og hi | | 
Riaeeeatt TTT ieee ieee | 
| i 
| COURSES: | 
| ~--=---->] CDESCRIET | 
| ees | | 
| | |n-=|---21 DORATON | | 
| ac | 
| Bs ee | 
| TITLE |------> | | 
| -----+--- v | 
| -—--->| LOCATION | | 
— hm | 
eRe ee SaaS | 


Figure 7.2 Functional Dependency Diagrams. 


We also need to introduce the concept of full func- 
tional dependency. This term is used to show the mininun 
set of attributes ina determinant of an FD. fRee. 13 ppe 
15-48] Attribute {or set of attributes) Y 1s said to be 
fully functionally defendent on attribute (or set of attri- 
butes) xX if Y is functionally dependent on X and Y is not 
functionally dependent on any proper subset of kX. For 
example, in the relation COURSE_ATTENDED, the attribute 


GRADE is fully functionally dependent on the attrikutes 
(MID, CCODE) because it is not functionally dependent on 
either MID or CCODE alone. On the other hand, in the rela- 
tion COURSES, the attribute CDESCRIPT is functionally depen= 
dent cn the attributes (CCODE, TITLE); however, it is not 
fully functionally dependent on those attributes because, it 
is also functionally dependent on either CCODE or MITLE 


alone. 


First normal form (1NF) deals with the "shape" of a 
record type or a tuple. Under first normal form, all tuples 
in a relation must have the same set of attributes, and the 
attrikutes must be atcmic (indivisible items). This defini- 
tion merely states that any ncrmalized relation is in first 
normal form. 

When determining whether a particuiar relation is in 
hormal form, the FDs between the attributes in the relation 
must ke examined. Fcr this reason, we will use a notation 
which was first progosed by [Ref. 16] to point out these 
relaticnal characteristics. In the notation, the relaticn is 
defined as divided into two components: the attributes and 


the FEs Eetween them. The format is 
Ro eee) go me pe) 


Ris the name of the relation, i, Y, and Z are the attri- 
butes, and X-->Y, X-->Z are FDSS” For exanplere neti gure uo! 
the relation COURSE_AITENDED is defined as 


COURSE_ATTENDED=({ {MID,CCODE,GRADE}, {MID,CCODE-->GRAD}} ) 


Many update and deletion anomalies can be eliminated 
by converting a relation to second normal form (2NF). Second 
normal form requires that all nonkey attributes must contain 


informaticn that refers to the entire key, not just part of 
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it. in cther words, a relation is said to be in 2NF if and 
only if it is in INF and every nonkey attribute of the rela- 
tion is fully functionally dependent on the primary key. The 
relation UNIT_ASSIGNEL, for example, in Figure 7.3 is in 1NF 
but not in 2NF because the + nonkey attribute GSTATUS 
(geographical status) is not fully dependent on the primary 
key UCODE (unit code) and MID. Here GSTATUS is fully func- 
tionally dependent on OUCODE, which is a subset of the 
primary key. 








7 > i. (on 
Relation: UNIT_ASSIGNED Key: UCODE, MID 
| _UCODE | LOCATICN | GSTATUS | MID | DATE | | 
““31 | Monterey | 100. | ID1 | 012583, 
“01 | Monterey | 100 | ID& {| 042385 
“U1. | Monterey | 100 | mp5 | 012581. 
“ui. {| Monterey | 100 | mp2 | 110182 | 
| "52. | Newyork | 200. | zD6 | 083084 | 
| | 3] penver | 300. | zD3 | 072082 | 
| |_03._| Denver | 300 | 1b5 | 100564 | 
! 2 J eI Os 9) Ee) ee ee | 
a —~ | 


Figure 7.3 Relation in INF but not in 2NF. 


In Figure 7.4 relation UNIT_ASSIGNED has been decon- 
posed into two relaticns, UNITS and ASSIGNMENT. Both rela- 
tions are in 2NF. Note that the relation UNIT_ASSIGNED 
suffers from wmodification anomalies with respect to ufdate 
operations. Figure 7.5 also illustrates the FDs for botn 
relations. 

Froblems occur with each of the following three 


basic operations. 
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UNITS key; UCODE ASSIGNMENT key: MID,UCOLDE | 
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Figure 7.4 Relations in 2NF. 
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Figure 7.5 FD Liagrams for UNITS and ASSIGNMENT. 
Inserting: We cannot enter the fact that a particule re mnest 


is located in a particular city until at least one officer 


is assigned to that unit. 
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| 
| UCL Key: UCODE ULG Key: LOCATION | 
| GCODE | LOCATION | | _LOCATION | GSTATUS | 

| U1 ‘| Monterey “Denver | 300 | 

| U2 | Newyork — “Monterey {| 100 
03. -| Denver “Newyork | 200 | | 

| rT [aerrose pm | 

| | 


Figure 7.6 Relations in 3NF. 


The original definition of 3NF was subsequently 
replaced by a stronger definition known as Boyce/Codd Nermal 
Form. (BCNF) which can be defined as follows. 

only) it R is a Boyce/Codd gene FOrm {Bene a and 

Ret. 3 ts sist eterminant is candidate ey. 
The original 3NF definition does not satisfactorily handle 
the case of a relaticn that has more than one candidate key, 
and modification ancmalies arise with this: definition when 
it 1s used with such relations. BCNF is often used to remove 
these anomalies. Fer example, consider the relation 
UNIT ASSIGNED (Fig. 7.3 ) and the FDs between the attrikutes 


of that relation such as 


UNIT ASSIGNED= ({UCCDE, LOCATION,GSTATUS,MID, DATE}, 
fUCODE-->LOCATION, UCODE-->GSTATUS, LOCATION-->GSTATUS, 
UCODE,MID——> DATE} ) 


Here the relation UNIT_ASSIGNED contains three determinants 
but only the determinant (UCODE, MID) 2S a candidate «ey. 
Therefore UNET_ASSIGNED 1S not BGNF. Sinamay UNi toa 
7.4 ) 1s not BCNF, because the determinant LOCATION is rota 
candidate key. On the other hand, relations ASSIGNMENT, UCL, 
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and ULG are each BCNF, because in each case the candidate 
key is the only determinant in the relation. 

BCNF is conceftually simpler than 3NF since it does 
not reference the concepts of primary key, transitive depen- 
dency, and full dependency. Although BCNF is stronger than 
3NF, it is still true that any relation can be decomposed in 
a nonloss way into an equivalent collection of BCNF 


relations. 


, bBortm and fifth pmermal forms deal swith nultavalued 
attributes. A multivalued attribute may correspond toa 
Many~tc-many relaticnship, as with officers and skills, or 
to a many-to-one relationship, as with the children cf an 
officer. By "many-to-many" we mean that an officer may have 
several skills and/or a skill may belong to several offi- 
cers. When we look at the many-to-one relationsnip between 
Children and fathers, it is a single-valued fact about a 
child kEut a multivalued fact akout a father. In some sense, 
UNF and SNF are alsc related with composite keys. these 
normal forms attempt to minimize the number of attritutes 
involved in a composite key. [Ref. 17] 

Forth normal form is based on the concept of nulti- 
valued dependency (MVD). The notation X-->>Y is used to 
indicate that a set of attribute Y is nultidependent ona 
Ser Of attributes of X. Pormally, MVD is defined as follows: 
Given a relation R with attributes X, Y, and Z, the multiva- 
lued dependency X-->>Y holds in Rif and only if the set of 
Y-values matching a given (X-value, Z-value) pair in R 
depends oniy on the X-value and is independent of the 
Z-value. The attributes X, Y, and Zmay be composite. 
[eeet. 32pp. 237-265 } 

Multivalued dependencies which have been defined can 


exist only if the relation R has at least three attributes. 


91 


It is easy to show that, in the relation R(X,Y,2Z), the MVD 
X-->>Y holds if and cnly if the MVD A-->>C holds. 

Before giving the definition of 4NF, it is conven- 
ient to state the following theorem proved by Fagin in 
[Ref. 18]. 

"Relation R, wath attributes XxX, 4, and Z, can be 


nonloss-decomposed into its two Speke LS te Oh R1(X,Y¥) and 
R2 (X,Z) 1£ and only if the MYD X-->>Y,2 bolds in hee 


Now fcurth normal fora (4NF) is defined as follows: 


"A relation R is in fourth normal form (4NF) if and only 
if, whenever there exists_an MVD in R, Say X-->>Y, then 
Fala attributes of R are also < Dae eee dependent on X 
Seas R==>ZetOE sald] attributes “Zaee )." [Rebt. 32D. 


Fagin also proves (see [Ref. 18] ) that NF 1S stricri, 
stronger than BCNF (i.e., any 4NF relation is necessarily in 
BCNF), and any relation can be nonloss-decomposed into an 
equivalent collection of 4NF relations. 

Fifth normal form  (5NF) deals with cases where 
information can be reconstructed fron smaller pieces of 
information which can be maintained with less redundancy. 
2NF, 2NF, and 4NF alse serve this purpose, but SNF general- 
izes to cases not covered by the others. Aho and co-workers 
in 1979 [Ref. 19] discovered relations that cannot be 
nonlosslessly decomposed into two relations but can te loss- 
lessly decomposed inte three or more relations. Because of 
this Eroresiiw. S5NF is also called projection-join normal 
form, andis based on the concept of join dependency (JD) 
which is a more general case of an MVD. In general, relation 
R satisfies the JD *(%,Y,..-,%8) 2 Sanedeenk, helt foe 
join cf its projecticns On 3,V¥,.capopeewmene GA, Yl pes, coe 
subsets of the set of attributes of R [Ref. 3:pp. 237-265). 
We can now define 5NF given by [Ref. 3:p. 262]. 

"A relation R is in. fifth ncrnal £ormmionm)-also called 
eee pgenee c= 117.2 normal form ({PU7NE)— See and only rr 


every 10am ependency in R is implied by the candidate 


keys o : 


Fz 


Since a JD is a more general case of an MVD, any relation 
which is in 5NF is necessarily in 4NF. But determining that 
a relation is in 5NF is less’ straight-forward than UNF, 
BCNF, etc. because discovering join dependencies is a 
nontrivial task. 


4. Lomain 7 Key Xormal Form 


Pies, Re Fagin [Ref. 11] defined a new nermal 
form called domain/key normal form (DK/NF). In his paper he 
proved that a relaticn in'DK/NF will have no insertion or 
deletion anomalies. He also showed that a relation having no 
modification anomalies must be in DK/NF. DK/NF is based on 
only the concepts of key and domain. These concepts are 
readily known and suprorted by DBMS products. The definition 
of DK/NF is guite simple. 

"A relation is in CK/NF if every constraint on the rela- 

€10N as a logical ccnsequeuce of the definition of keys 

and domains. (Ref. 6:p. 299] 
im this derinitaon, censtraint is a broad term. Any rule on 
static values of attributes that can be evaluated precisely 
whether or not it is true is said to be a constraint. Thus 
PDS, MVDs, JDs, and edit rules are all examples of 
constraints. Some ccnstraints which have to do with changes 
in data values are excluded from the definiticn of 
Constraint. 

DK/NF relaticn requires that if keys and domains can 
ke defined such that ali constraints will be satisfied when 
the key and domain definitions are satisfied, then modifica- 
tion ancmalies are impossible. But there is no known way to 
put a relation in [CK/NF automatically. In spite of this 
problem, DK/NF can te extremely useful for practical data- 
kase design. DK/NF is a deSign objective. Database designers 
wish to define their relations such that constraints are 
logical consequences of domains and keys. This goal can be 


accomplished for many designs. 
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Seven normal forms have been discussed and are 


Summarized in Figure 7.7 [Ref. 6:p. 305]. 





FOr Defining Characteristic 
INF Any relation. 
2NF All nonkey attributes are dependent 


on all of the keys. 


(a 
= 
hy 


There are no transitive dependencies. 
UNF Every MVD‘is a functional dependency. 
oNF Join dependencies are satisfied. 
DK/NF All constraints on relations are logical 


consequences of domains and keys. 


[|e ny ed 


Figure 7.7 Summary of Normal Forms. 


C. RELATIONAL DESIGN PROCEDURES AND CRITERIA 


1. Design Procedures 


The relational model is attractive in database 
design since it provides formal criteria for logical struc- 
ture, namely, normal form relations. In order to prcduce 
those relations, database desiyners should choose a design 
procedure. Two different approaches have been proposed: 

"1.,Deccmposition precedures. These commence with a set 
of cne or more relations and decompose nonnormal rela- 
tions 1n this set into normal forms. 
eS eos procedures. These commence with a_ set of 
functicnal dependencies and use them to construct normal 
Lorm réelatzome." [kef. Vepee oor 
In practical situations, synthesis procedures are mecre 
attractive than deccmposition procedures. Many algoritams 


have keen proposed for relational design and each algorithn 
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produces relations that satisfy some subset of the _ rela- 
tional design criteria which will be discussed in the next 
SeGraon . 

Decomposition algorithms start with one relaticn and 
successively decompose it inte normal form relations. The 
relations in 3NF and BCNF are not sufficient for applying 
these decomposition algorithms, so the ideas of MVD and 4NF 
have to ke known. Synthesis algorithms, on the otker hand, 
start with a set of FIs and synthesize them into ncrmal forn 
relations. In other words, these algorithms use FDs to 
produce normal form relations. Detail information about 


design algorithms can be found in [Ref. 1l:pp. 59-88}. 


2. Relational Database Design Criteria 


This Section presents several different design 
criteria which have been identified in {Ref. 6:pp. 307-311] 
and [Ref. 16] for froducing an effective relational 
datakase. 

a.Elimination of Modification Anomalies 

The objective of this criterion is to eliminate 
all ancmalies resuiting from databaSe:-operations. AS we have 
seen, if relations are in DK/NF, then no modification anonma- 
lies can occur. This is why DK/NF is a design objective. The 
problem is to find a way that all relations can be put into 
DK/NF. 

rE.Relation Independence 

According to this deSign goal, two relations are 
said to ke independent if modifications can be made to one 
without regard for the other. However, this criterion is not 
always achievable. Interrelaticn constraints allow relations 
to be depencent. To eliminate this dependency the relaticns 
can be jcined toyether. After the join operation, the new 
relation may have modification anomalies. To eliminate these 


anomalies, relations are decomposed into two OL more 
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relations; but this cperation creates interrelation déepen- 
dencies again. Here we see the conflict in design goals. In 
this casé€é we must chcose the least of the evils, based on 
the reguirements of the application. | 
c.Ease of Use 
This third criterion for a relational database 

design makes the relations seem natural to users. As far as 
possikle, designers should attempt to structure the rela- 
tions so that they are familiar to users. From time to time 
this criterion conflicts with the other two criteria. 

: d. Representation 

This relaticnal criterion states that the  ienae 
Structure has to correctly represent the original specifica- 
tions. That is, all the relations in the output design 
process must satisfy the conditions for normal form. = 
Beeri and co-workers have defined three important feints for 
the representation of a set of relations, Sin, in the input 
design process by a set of relations, Sout, in the output 
design process (Sin ard Sout are sets of relations used in 
the input and output design processess) : 

> eee a ea Sout contain the same attributes 


-REP2: The relaticns Sout contain the same attributes 
and the sare FDS as Sin. 


-REP3: The relaticns Sout contain the same acts sane 
and the sare data aS Sino’ ™ [| terete O50) 

The first representation, REP1, requires all the 
attributes in Sin to also be in the pelatvons sin Sout. jeu 
it does not address any dependencies between the attributes. 

In regard to REP2, representation reguires that each 
FD in Sin kre either contained as an FD in one of the rela- 
tions in Sout or derived from the FDs in the relations in 


sout, using the FD inference rules. 
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The third representation criterion, Reva also 
requires that the relations in Sout contain exactly the same 


tuples as the original relations in Sin. 


e. Separation 
The separation criterion means that the originzal 
specifications are separated into relations that satisfy 
certair conditions. As we have discussed earlier in this 
Chapter, the database must Fe divided into a number of 
normal fcrm relations. 
£.Redundancy 
This last criterion points out the fact that the 
final structure must not contain any redundant information. 
It is fpossible to define the redundancy criterion in 
different ways. One set of redundancy criteria is shcwn 
Eelow: 
err iemneLetati On in Soutsis cCedundant if its attri- 
puees are ccntained in the other relations in 
-RED2: A. relation in Sout is redundant if its FDs are 
the same or can be derived from the FDS in the 
mane relations in Sout. 
-RED3: A relation in Sout is redundant if its content 
can be derived irom the contents of other rela- 
TLOUS ipeoouts ’ ~—[ Ref. 3p. 
Here, RED1 is not a very useful idea, because during deccn- 
position it is often necessary to create separate relaticns 
that represent FDs between attributes, which may appear in 
other relations. RETZ and RED3, however, can be very useful 
criteria. Any design algorithms should avoid RED3 Fecause it 
would keep the same data in more than one relation. 
The design criteria discussed in this Section caa 
Gonriict. When conflicts occur, the designer has to assess 
priorities and make the best possible compromise in light of 


reguirements. There is no Single rule of priority. 


oa, 


De. TRANSFORMING THE SDM INTO RELATIONAL MODEL 


Figure 7.8 ‘1llustrates the logical design of the 
Personnel database (see SDM in Chapter V). This logical 
schema cannot be used to implement the relational personnel 
datarkase for the follcwing reasons: 

1. Mest of the relations have multivalued attributes. 
Such attributes cannot be used in a relation, 

2. The logical schema allows some tuples to be ccntained 
in other tuples. The relations must be normalized or 
redefined to eliminate these inconsistencies. 

As shown in Figure 7.8, inversion, watching, and deriva- 
tion have been used to provide interrelationships between 
the attributes. Inverse and match functions must be elini- 
nated in crder to achieve DK/NF. During this process, the 
new interrelation constraints should be added. 

lbnitaalie the relationships between OPELGPR a wand 
ACADEMIC EDUCATION, MILITARY EDUCATION, MEDICAL INFO, and 
FOREIGN _LANGUAGE were assumed as one-to-many. For example, 
an officer can have mere than one medical report, and many 
reports may belong to one officer. Such relationshifs were 
descriked by match function in the SDM design. On the other 
hand, relations OFFICER and UNIT have many-to-many relation- 
ships with each other, and these relationships were defined 
by the inverse functicn in the same SDM. The relationship, 
for example, between unit_assigned and officer_assigned is 
Many-tc-many. To eliminate this problem, a new relation 
Calied ASSIGNMENT has been constructed. 


Ey considering all those conditions, frcules, design 
Criteria, etc. described in this Chapter, the resulting 
relational design (relational schema) is illustrated in 
Figure Jer and domain definitions with attribute/dcnain 


corresrfondences are shown in Figure 7.10 and Figure 7.11 


respectively. For simplicity, some attributes are removed in 


oe 


= rey 


OP rier itary ID; ee ey. poner ol Name, 
Birth _ date, Beginning date _to_active duty, 
Native G cue Sex, Marital Status, Number_ 
of _ children, Permanent address, Current 
address, Frimary_branch, Secondary branGi, 
Academic_education, Military_educaTion, __ 
Health coéndition,Foreign language capability, 
Unit_assicned) 


Keyl litary iD ; 
Notes: 1. Academic @€ducation is a contained 
ACADEMIC MAJOR tuple, muitivalued. 

Ze ee Cera is a contained 
MILITARY EDUCATION/COURSES tuple, 
multivalued, | : 

3. Health condition is a contained ME- 
DICAL_ INFO tuple, multivalued. 

4. Foreign_language capability isa 
contained FOREIGN_LANGUAGE tuple, 
multivalued. . 

5. Unit_assigned is a contained UNIT 
tuple, multivalued. 


UNIT (Unit_code, Name, Unit_category, Location, 
Superior unit, fnit ftnction, Officer asSigned) 


Key; Unit_code : 
Note: Orficer_assigned 1s a contained OFFICER 
tuple, multivalued. 


ACALTEMIC_MAJCR (Academic_branch, Academic_degree, AIT, 
Date, Name_of_ university) 


Key: (Academic_branch,Academic_degree, AID) 
MILITARY EDUCATION/COURSES Peete aa Berge) 5 CHOON 
code, Location, MEID, Course/School title, 
Duration, Late, Grade) 
Key; (Course/Military_school_code, Location, 
MEID) 
a cei cal report number, HID, Date, Height, 


pace Biccd_ pressure, Eye condition, Ear 
eonqgition, Internal, General health status) 
Key: (Medical_report_number, HID) 


FOREIGN_LANGUAGE (Name_of_language, FID, Degree_of_ 
capability) 

Key: (Name_of_language, FID) 

ASSIGNMENT REQUEST (Unit_code, Reguest_number, Date, 

pees renee requested, Secondary _branch_ 

requested, Academic_major_reguested, Military/ 
course _education_requested edical_ status, 
Number _of_person, Number of requests) 


Key: (Unit_code, Reguest_number) 


Figure 7.8 Summary of Logical Design. 
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the relational schema in Fig. 7.9, and they are referred to 
the attribute OTHERS. A sample of the designed database with 
example data is given in Appendix B. 

In order to be familiar with some currently availatle 
CBMSs, the INGRES TEMS will ke introduced in Chapter VIII 
and the sample personnel database in Appendix B will also be 
implemented by using another DEMS known as ORACLE in Chapter 


CFFICER (MID,Rank,Name,Sex, Fri_bran,Sec_bran, Others) 
Key: MID 
UNIT (Ucode,Unane,Ucat,Uloc,Sup unit,Ufunc) 





Key; Ucode 

A_EDUCATION (Abran,Adeg,AID,Univ,Gdate) 
Key: (Abran, Adeg, AID) 

M_ EDUCATION (Ccode,Cloc, 4H=EID,Cgrade, Cdate) 
Key: (Ccode,Cloc, MEID) 

M_COUFSES (Ccode,Cloc,Ctitle,Cdesc,Cdur) 
Key: (Ccode,Cloc) 

MECICAL (Repno, HIT,Rdate, Eyecond, Earcond, Hstat,Others) 
Key: (Repno, HID) 

LANGUAGE (Nlanguage,FiID, Ldegree) 
Key: Nlanguage 


ASGREC (e- neores re a Ne Regaaee ReoGanken apELDE , 
ecbr, _acabr, toiled, ke hstat; THOG ASRS) 


ne ET ee | 


Key: (R_ucode,Reqnua) 
ASSIGNMENT (AMID, A_ucode,Orderno, Asgdate) 


| Key: (AMID,A_ucode,Asgdate) : 


Figure 7.9 Relational Schema for Personnel Datakase. 
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eS a eee ee 


DOMAIN NAME 


MID 

RANK 

PERSON NAMES 
SEX 

ERANCHES 
UNIT_CODE 
UNIT NAMES 
UNIT_CAT 
LOCATION 
UNIT FUNC 


ACADEMIC BRANCHES 
ACATEMIC DEGREES 
UNIVERSITY NAMES 


CATE 


COURSE/SCH_CODE 
COURSE/SCH_ TITLES 
COURSE/SCH_DESC. 
COURSE_GRADES 


EYE EAR_CONDITICN 
HEAITH STATUS 


LANGUAGES 
LANGUAGE_CAP. 
CRDER_NO 
CIHERS 
INTEGERS 


FORMAT and MEANING 


numeric 999999999 


CHAR(4); abbreviations of military 
ranks for the army 
CHAR(20); names of officers 


CHAR WEae Value 1s *M* or ‘*F* 


CHAR (8); abbreviations of military 
branches for the army 
CHAR (6) 3; unit codes 


CHAR(15); names of units 


CHAR (4); unit categories 
Fauve’; *nhosp', etc. } 
CHAR(15); names of locations 


CHAR(6); unit functions 


CHAR (5); abbreviations of 
academic branches 

CHAR (3); value is "BAT, (BSt*DR!', 
Vinee fot, Or SENG". 

CHAR(10); names of universities 


CHAR (9); format is DD-MMM-YY 
CHAR(6) ; course or school codes 
CHAR(10); titles of courses 
CHAR(30); description of courses 
CHAR(2) 3; value is 'A',"A-','B+t', 
TBE Bt  tcCet tc th tc 
mrerpr fpr, or ‘x? 


numeric 99; codes for eyes and 
ears 





humeric 99 


CHAR(10); names cf foreign 
anguages 
Numeric 9 


Chiko ew toLtmat is "999999-9* 
Suvctasswo& STRINGS where 


specified ae: 
humeric values where specified 
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Figure 7.10 Domain Definitions. 
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Figure 7.11 


ATTRIBUTE 


AID,AMID,FID, 
HID, MID,MEID 


RANK, R_RANK 
CNAME 
SEX 


ERI BRAN, R_ 
SEC_ERAN, RO 


: 
iccse, shp—un 
UNAME 

UCAT 
ULOC,CLOCB 
UFUNC 

ABRAN, R_ACABR 
ADEG 

UNIV 


ASGLATE,CDATE,GLATE, 
RDATE, REODATE 


CCODEA,CCODEE, 
RF MILED 


Clit: 
CDESE 
CGRADE 
CDUR, NUMOFPERS 


CRDERNO,REPNO, 
RECNUM 


EY ECOND, EARCOND 
HSTAT, RHSTAT 
NLANGUAGE 
IDEGREE 


Domains and 


DO MAIN 


MID 
RANK 

PERSON NAMES 
SEX 


BRANCHES 


UNIT _CODE 


UNIT_NAMES 
UNIT_CAT 

LOC ATION 

UNIT _FUNC 
ACADEMIC_ BRANCHES 
ACADEMIC_DEGREES 
UNIVERSITY NAMES 


WAL te 


COURSE/SCH_CODE 
COUFSE/SCH_ TITLES 
COURSE/SCH_DESCRIPTION 
COU RSE_GRADES 

INTEGERS 


ORDER_NO 

EYE_EAR CONDITION 
HEALTH STATUS 
LANGUAGES 

LANGUAGE CAPABILITY 
OTHERS 
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Attributes for Personnel Datakase. 


VIII. INGRES - A RELATIONAL DATABASE SYSTEM 


Aw. INTRODUCTION 


INGRES (Interactive Graphics and Retrieval System) isa 
relational database and graphics system which is implemented 
cn top of the UNIX operating system developed at Bell 
Telephone Laboratories. The implementation of INGRES is 
primarily programmed in "C", a high level language in which 
UNIX itself is written. Parsing is done by using YACC, a 
compiler-compiler available on UNIX. 

INGRES runs aS a normal user job on top of the UNIX 
operating system. The primary significant modificaticn to 
UNIX that INGRES reguiresS is a substantial increase in the 
maximum file size allcwed. 

In this chapter we shall describe some of the princiral 
components of INGRES. These include the cuery language CUEL, 
INGRES utility commands, and the storage structures 
supported. 


Be. QUEL: A RELATIONAL QUERY LANGUAGE 


QUEL (QUEry Language) is a calculus based language. Each 
interaction of QUEL ccntains one or more range-Statements of 


the fora 
RANGE OF variable list IS relation_nane 


The purpose of this statement is to specify the relation 
cver which each variable ranges. The variable list portion 
cf a RANGE statement declares variables which will be used 
as arguments for tuples. These are called tuple variables. 

tach QUEL interaction alse includes one or more state- 


ments of the forn 


Ccmmand [result_name](target_list) 
[| WHERE Qualification ] 


Here command is e€ither RETRIEVE,APPEND, REPLACE, On 
DELETE. We use square brackets ([ }) to denote "zero or 
more". For RETRIEVE and APPEND, result_name is the name of 
the relation which gualifying tuples will be retrieved into 
cr appended to. For REPLACE and DELETE, result_name is the 
name of a tuple variatle which identifies tuples to be msodi- 
fied or deleted. The targqetmrist 15 a Pls leon gene wea 


result _domain = CUEL Function ... 


Here the result_dcmains are domain names in the result 
relation which are to be assigned the values of corre- 
sponding functions. 

The goal of a query iS to create a new relation for each 
RETRIEVE statement. The relation so created is named by the 
"result_naze" clause and the domains in that relation are 
named by the "result_domain" names given in the target_list. 
The result _domain name may be omitted and is then taken to 
be the same as the Dctain_name in the function. The result- 
Mame ig an optional farameter to designate that the takle 
returned Ly the query be permanently stored in the database 
with the resuit_name as its identifier. Retrievals that 
specify a result_name do not display the result table on the 
terminal screen. The result_name cannot be the name cf an 
existing takle. 

Tc create the desired relation, first consider the 
Froduct of the ranges of all variables which appear in the 
target Jist and the gualification of the RETREIVE statement. 
Each term in the target_list is a function and the 
Qualification is a truth function, 1.€.; a £unction vith 
values true or false, on the product space. The desired 
relaticn is created fy evaluating the target_list on the 
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subset of the product space for which the Qualificaticn is 


true, and eliminating duplicate tuples. 


The QUEL examples in this chapter all concern the 


following relations. 


CPEACERMALD, RANK, CITY) 
COURSE (CCODE, TITLE,CDESCRIPT,DUR, LOCATION) 
COURSE_ATTENDED (MID,CCODE ,GRADE) 


The following are valid QUEL interactions. 
Example 1. Compute duration multiplied by 7 for course 
Wepsys. 


PANGEeon G6 £5 CCURSES 
RETReEVvE INTO ¥ 
(DUR_IN_DAYS = C.DUR * 7) 
Whee. 1 TLE =—e"Wepsyst 


Here C is a tuple variable which ranges over the COURSES 
relation, and ail tuples in that relation are found which 
Edtasty the qualification C.TITILE = “Wepsys". The result of 
the guery is a new relation, W, Which has a Single dcmain 
DUR_IN_ CAYS that has been calculated for each qualifying 
tuple. If the resulting relation is omitted, qualifying 
tuples are written in display format on the user's terminal 


or returned to a calling progran. 


Example 2. Insert the tuple (ID4,Cart, Jonn,Salinas) into 
Cr C PRaeelation . 


APE e TO OFFICER(MID = "ADA", RANK = “Capt", 
MaMa = Mionun”,Clhiy =. "Salinas"™) 


Here the resulting relation OFFICER is modified by 
adding tke indicated tuple to the relation. Domains which 
are nct specified default to zero for numeric domains and 


null for character strings. 
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Example 3. Cancel all the courses which are given in 


Monterey. 


RANGE OF C IS COURSES 
TELETE GPWNHERE CcLOCATICN = "Nontege, 


Here C specifies that the COURSES relation is to be 
modified. All tuples are to ke removed for which LOCATION 
has the value "Monterey". 

Example 4. Promote ali captains to major if the officer 


got the grade 'A' fre any course. 


RANGE OF © i5 OP RicER 
RANGE OF CA IS CCURSE_ATTENDED 


REEFLACE O(RANK = "Maj") 
WHERE O.RANK = "Capt" AND 
O.MID = CA.MID AND CA.GRADE = "a" 


Here O.RANK is to be replaced by "Maj" for those tuples 


in OFFICER relation where the gualification is true. 


C. <INGRES UTILITY CCHMANDS 


In addition to the above QUEL commands, INGRES supforts 
a variety of utility commands. These utility commands can be 
classified into seven major categories. 

1. Invocation of INGRES: 


IiNGRES database_name 


This command invokes INGRES. "Database_name" which is 
tke name of an existing database. (A database is 
simply a named collection of relations with a given 
database administrator.) This command executed from 
UNIX "logs in" a user, then the user may issue all 
other commands (except those executed directly from 


UNIX) within the environment of the invoked datakase. 
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Creation and destruction of databases: 
CREATEDB datakase_name 
DESTROYDB datakase name 


These two commands are called from UNIX. The CREATEDB 
ccmmand creates a new INGRES database. The person who 
executes this command becomes the Database 
Administrator (DBA) for the database. DESLROYES 
command removes all references to an existing data- 
Fase. The directory of the database and all files in 
that directory are removed. To execute DESTROYDE that 
person must be the DBA for "database_name". 

Creation and déstruction of relations: 


CREATE tablerame (columnname = format,columnname = 


foresees... ) 
DESTROY tablenane 


The CREATE command enters a new table into the data- 
base. The table is "owned" by the user who invokes 
the command. DESTROY removes the table from the data- 
base. Only the table owner may destroy a tarkle. The 
columns are created with the type specified by 
"format". The current formats accepted by INGRES are 
1-, 2-, and 4-byte integers, 4- and 8-byte flcating 
peint numbers, and 1- to 255-byte fixed length ASCII 
character strings. 


CPuekeGODY Of ata; 


COPY tabdlename(columnname = format,columnname = 


format,... ) intojfrom "filename" 


PRINT tablenare 
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The COPY command moves data between INGRES takles and 
standard files. "Tablename" is the name of an 
existing table. In general, “columnname" identifies a 
column in the table. "Format" indicates the stcrage 
fermat for the column's values in the file. To write 
a file, use the "into filename" form of the CCPY 
command. To copy data from a file to an INGRES table, 
use the “from filename" form of the command. The 
PRINT command displays the contents of a table speci- 
fied at a user's terminal under predefined formats. 


Storage structure modification: 


MODIFY tablenagze TO storage_Sstructure 
ON(keyl,key2,... ) 


INDEX ON tablename IS indexname(key1, key2,... ) 


Tre MODIFY conmgand changes the storage structure of a 
relation from one access method to another, Only the 
owner of a takle can modify that table. This ccmmand 
is used to accelerate performance of queries that 
access the tatle, particularly when the table is 
large or frequently referenced. The storage struc- 
tures currently supported will be discussed in 
Section D of this chapter. The indicated keys are 
domains in taklename which are used concatenated left 
to Tight to form a combined key which is used in the 
organization cf tuples in all but one of the access 
methods. The INDEX command creates a secondary index 
on existing tables in order to make retrieval and 
updating with secondary keys more efficient. The 
secondary key is constructed of columns from _ the 
primary table in the order given. A maximum of six 
"Columnname"s may be specified per index, but a user 


can build any number of secondary indexes for a 
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primary table. Only the owner of a table is allowed 
to create secondary indexes on that table. In crder 
to Maintain tke integrity of the index, users are not 
permitted to update secondary indexes directly. 
However, wherever a primary table is changed, its 
secondary indexes are automatically updated by the 
systen. 


Consistency and integrity control: 

DEFINE INTEGRITY ON range_var IS qual 

DESTROY INTEGRITY tablename(integer,...,integer| all) 
HELP INTEGRITY tablename 

RESTORE database_name 


Tke DEFINE INTEGRITY command acds die Lntegrar y 
ccnstraint for the table referred to by "range_var". 
After the constraint is defined, all updates to the 
table must satisfy "qual". "Quai" must be true for 
every existing row in the table when the INTEGRITY 
statement is issued. Updates that violate any integ- 


Eity constraints are Simply not performed. 


HELP INTEGRITY command prints current integrity 
ccnstraints on a specified table. DESTROY INTEGRITY 
removes integrity constraints from ae table. TO 
destroy constraints for a table, the integer argu- 
ments Should fe those printed by a HELP INTEGRITY 
ccmmand on the same table. Only the table owner nay 
destroy integrity constraints. 


The RESTORE ccmmand checks and cleans up a database 
after an INGKES or operating system crash. RESTCEE 
Should be executed after any abnormal termination to 
assure database integrity. The RESTORE command is 


only available to the database administrator. 
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7. =%Mascellaneous: 
ee 
SAVE tablename UNTIL expiration_date 
PURGE database_name 


HELP may be used to print information about INGRES 
features, Gefinitions of vievs, protections or 
permissions, cr information about the contents cf the 
database and specific tables in the database. SAVE 
is the mechanism by which a user can declare his 
intention to keep a table until a specified time. 
PURGE is a UNIX command which can be invoked by a 
database administrator to delete all relations whose 
"expiration_dates" have passed. This should be done 
when space ina database 1S exhausted. (The database 
administrator can also remove any relations from his 
database using the DESTROY command, regardless of who 
their owners are.) 


De. STCRAGE S2RUClIUR ES 


Cfiten the relaticn (table) will be stored in such a way 
that a complete scan is not required. Also secondary indices 
can ke declared and are used if possible to limit the number 


of tuples examined. 


There are five ucdes of relation storage structure. A 
relation owner can decide both storaye structure and what 
secondary indices (if any) to construct, then both decisicns 
will te done automatically by the systen. The five main 
storage structures are: 

1. ISAM : indexed sequential access method structure, 
duplicate rows removed 


2. CISAM : compressed isan, duplicate rows removed 


Se hionwD es LCandem@hash “Storage structure, duplicate 
[cwS removed 

4. CHASH : compressed hash, duplicate rows removed 

aie JHEAL : unkeyed and unstructured 

For the first four structures the key may be any ordered 
collection of domains. These schemes allow rapid access to 
specific portions cf a relation when key values are 
supplied. The remaining non_keyed scheme (a "HEAP") stores 
tuples in the file independently of their values and 
provides a low overhead storage structure, especially 
attractive in situations requiring a complete scan of the 
Eelation. 


The CRACLE relational DBMS has been used to isplement 


the Fersonnel Datakase because OE 


Clarity. SGL is the lanyuage that is 


its Sinplverty. aoa 


used to access and 


control data in an ORACLE database. AS a result of this, SQL 


is used as DSL in the database operations such as table and 


View creation, updating data, and in ,ueries. There are nine 


relations in the Personnel Database. 
designed database with example data is 
A relation can be created using CREATE 
of CREATE command te create OFFICER'S 


follows: 


UPT> CREATE TABLE OFFICER 


( 410 NUMBER(S) NOT NULL, 
RAN< CHAR(4), 
ONAME CHAR(9), 
SEX CHAR(1), 


P&T eBRAN CHAR(6), 
SEC rBRAN CHAR(6), 
OTHERS CHAR(8) ); 


CON OWN & Wt 


Taole created, 


A sample of the 
Shown in Appendix B. 
command. An example 


relation can fe as 


After the relaticn is created, tuples of OFFICER can he 


inserted using the INSERT command. 


JFI> INSERT INTO OFFICER 


2 VALUES (27363, 'caot','Jonrson','m',*tartill','oilor'); 


l record created, 


UFT> INSERT INTO OFFICER 


e VALUES (12239, 'naj',*hernandez','m','inttry','spefe'); 


! record created, 


le 


Users of the Perscnnel Database wish to get scme infor- 
Bation by asking the following sample queries. 
Vel wcieedr i tiuplesaimeamne relation OFFICER using SELECT 


command. 


WET> SEEECT « 
2 FROM OFFICER; 


MID RANK 9NAME SEX ORT*#3RAN SEC*BRAN OTHERS 


27363 capt Johnson n artil pilot 
12239 maj cernandez inftry soefe 
P 32458 lit Woobins f airdef adp 
43596 21t Smith m nedic piloe 
10999 JIcol 3Srown m inftry ado 
35768 13t Greenderg n sijcor pilot 
29364 Capt Janes 7 nileng soefe 
16745 maj Leignton 1 finance ado 
10792 co} Stone m ordanan artildl 


9 records selected. 


Z.- List all officers who were assigned between the date 
f-An—-19606 and I-JAN-1980. 


UFI> SELECT MID,RANK,ONAME 
2 FROM OFFICER 
3 WHERE “I9 IN 
u ( SELECT amMIO 
5 FROM ASSIGNMENT 
6 AHERE ASGDOATE BETWEEN ‘'l1leJANo66' AND '1eJAN=-80'); 


MJD RANK OQNAME 
10792 col Stone 
10999 col 3rown 
16745 maj Leiaqhton 
27363 capt Jonnson 
29364 capt James 
12239 maj xoernandez 


6 records selected, 


lites 


3. List military IDs, Panks, namespedadeprinary Seranche- 


of all officers who have taken an ‘adp* course. 


UFI> SELECT MID,RANK,ONAME,PRI®3SRAN 


2 FROM OFFICER 

3 WHERE “ID IN 

4 ( SELECT METO 

5 FROM MeEDUCATION 

6 AHERE CCODEA IN 

7 ( SELECTS CEDES 

8 FROM MeCOURSES 

9 WHERE CTITLE = ‘ado’ )); 
; MIO RANK ONAME PRIe*BRAN 


32458 Itt Robbins airdef 
10999 lecol Brown inftry 


4. List all unit categories for units. 


UFI> SELECT UNIQUE UCAT 
2 FROM UNIT; 


UCAT 
div 
org 
deo 
hoso 
reg 
det 
oN 


7 recordas selected, 
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tranches 


course 


Pee roe malt Peary ils, ranks, 
mor. ai) 
oou02* . 


hames, sex, and primary 


officers who speak German and took the 


UFI> SELECT MID,RANK,ONAME,SEX,PRI*BRAN 
FROM OFFICER 
WHERE “ID IN 

( SELECT MEID 


. 


OoOOOn eo WoW 


= 


FROM MeEDUCATION 


AHERE CCOOEA = 
AND MID IN 
(Sttect FI 


FROM LANGUAGE 


AHERE NLANGUAGE = 


MID RANK JNAME 


PSS 002.) 


‘german’ 


SEX PRI ¢S3RAN 


35758 tle 


6. Crder all military education 


and within 


crder. 


tt 


Greenberg m 


course _ccde put them in 


JE SELECT * 
2 FROM MeEQUCATION 


sigcor 


tuples by course_code, 


descending course grade 


3 ORDER 8Y CCODEA,CGRADE; 


records selected. 


CCODEA MET) 
AAI02 43595 Ae 
AD002 32458 A 
AS003 27363 B+ 
$302 32455 B+ 
CS509 10999 Ae 
HS706 43595 A 
[A076 16745 Ae 
1S00Ss 10999 A 
T3005 12239 Ae 
0C092 10792 Ae 
$S002 35768 B+ 


CGRADE COATE 


12-JUL<94 
23eNOV=82 
132APR-77 
26*0CT=84 
3L\-JAN©77 
22eJUN=482 
22°NOV=76 
11-OCT-70 
30*°SEP-72 
OL-DEC-69 
26-FEB=82 


A view may derive data from more than one relaticn. This 
is done ty defining a view using a join query. An example of 
a view (view OFFICERELIUC) is Shown below. 

7. Create an OFFICEREDUC view from a join of the OFFICER 
relaticn to the A_EDUGATION relation. 


JFI> CREATE VIEW OFFICEREDUC(VID,RANK,VNAME,VBRAN, VDEG,SEX) AS 
2 SELECT MIO.RANK,ONAME, ABRAN, ADEG, SEX 
3 FROM OFFICER,Ae* EDUCATION 
4 AHERE OFFICER.MID = AtEDUCATION, AID; 


- Jiew created. 


8. Count the numker of officers who are captain and have 


"BS' academic degree. 


UFI> SELECT COUNT(CRANK) 
2 FROM OFFICEREDUC 
3 AHERE VOEG = *83'! 
4 aANO RANK = ‘eapt'’; 


COUNT (RANK ) 


2 


9. List all assignments, ordered by assignment date, for 


orficer identified by '10792' as MID. 


UFI> SELECT MIOD,ONAME,UNAME, ASGDATE 

2 FROM OFFICER, ASSIGNMENT, UNIT 

3 AHERE “IO = 10792 AND NID = AMIO 
Q AND UCODE = AeuUCODE 

S ORDER 8Y ASGOATE; 


MIO ONAME UNAME ASGOATE 
10792 Stone 9th Art Beg OL-SEP-66 
10792 Stone 2nd Inf Div LIeJAN@-7]1 
10792 Stone 64th Ord Deoot 15-APR=-78 


10. List all officers and courses for officers in 


of the courses with duration of at least one year. 


I> SELECT RANK,ONAME,CDESTC,CLICB,COLIR 
2 FROM OFFICER,Me COURSES, MeE DUCATION 
3 aAHERE “ID = MEIO 

OU VANOSEGIDEA = CCODEB 

5S ANd COUR >s 523 


ANK ONAME EDESE CLOCES CDUR 
lt Smith Aca. of Health Sci. Ft .Houston, Tx 96 


}t Smith Army Aviation Schoo!} Ft Rucker, AL Se 


ala 


terms 


11. Compute and display the sum of duration of course(s) 
which was taken by officer ‘'Smith'. 


UFI> SELECT SuM(COUR) 


2 FROM OFFICER, MeCOURSES,MeEDUCATION 
3 wWHERE MID = YEIO 

G4 AND ONAME S&S "Smith! 

S AND CCODEA = CCODEB; 
SUM(COUR) 
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12. List all officers and assignment requests, including 
rank, officer name, military_ID, and request number, for 
assignment reyguests where officers' Specialties meet the 


reguirements specified in that reguest. 


UFI> SELECT REQNUM, 41ID,RANK,ONAME 


2 FROM OFFICER, ASGREGQ, ACEDUCATION, Me EDUCATION, MEDICAL 
3 aAHERE MID = MEID 
4 AND “IO = AID 
S AND 4IO = HID 
6 AND PRI*®3RAN = RePRIBR 
7 AND SECe*SRAN = ReSECSR 
8 AND ASRAN = ReACABR 
9 AND CCODEA = ReMILED 
10 ANDO HSTAT <= ReHSTAT; 
REQNUM MID RANK ONAME 


32768524 12239 maj Hernandez 
03108423 16745 maj Leighton 
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A. INTRODUCTION 


The PEINCipal function of the DBMS is to- store, 
retrieve, and modify data. However in an operational envi- 
ronment the DBMS should provide other important functions. 
In this chapter, we will first describe major DBMS functions 
and then discuss three of these functions in detail : 
LTecoveLy, concurrency, and security. 

Major DBMS functions are described by Codd, JP Tapa 
[Ref. 20sp. 114], and they are shown in Figure 10.1. 


1. Store, retrieve, and update data. 

Ze Provide recovery services in case of failure. 
3. Provide concurrency control services. 

4. Provide security facilities. 

Cc 


- Provide integrity services to enforce database 


| 
| 
: 
| 
constraints. | 
6. Erovide a user-accessikle catalog of data 
descriptions. 
7. &Merpomt logical transactions. 
| 8. Interface with communications ccntrol programs. | 
| €. Frovide utility services. | 
| 
ae a 
ee a ee J 





Figure 10.1 Major Functions of a DBMS. 


The first and fifth functions have been discussed in 
previous chapters. RecoveLly, Seoneurrency control, and 


security facilities will be discussed later in this charter. 
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The management of large, complex databases is difficult. 
Maintaining a database with ten record types and hundreds of 
data-items can be time-consuming. If a database is prcecessed 
ty hundreds of application programs then changes of records 
and data-items can be risky. Questions like "Which programs 
are affected by a change to eight-digit 
Unit_Location Codes?" or "Which records) @eentain iti eae 
frequent. Since most databases are self-describing, much of 
the data needed to answer these questions are stored within 
the database. However, these information may not be readily 
accessed by humans. For that reason, "a user-accessitle 
cataicg" which contains data descriptions and data about the 
relationship between frograms and data will be very useful 
to the user, and it should be provided by DBMSs. 

A logical transaction is a sequence of activities 
performed atomically. Usually, transactions include several 
actions on the datatase. Unfortunately, the DBMS product 
cannot know which grcups of actions are logically related. 
Thus the DBMS must frovide facilities for the application 
programmer to define transaction boundaries which are needed 
in handling concurrent control and recovery functions. 

In addition to these functions, the DBMS must interface 
with a communications control subsystem which controls the 
flow of transactions to application programs from the DBMS. 
Finally, the DBMS must provide utility programs to facili- 
tate datakase maintenance. These utility programs may be 
used to unload, reload, and execute the database; or they 
may ke used to make twass insertions or deletions of data in 
or out of the database. 

No current DBMS frovides all of these functions ina 
satisfactory way. These capabilities can be used as a 
checklist of decisicr criteria for a DBMS. A system that 


does not provide most of them is not truly a DBMS. 
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Be. RECOVERY 


Ccmputer and database systems can fail in many ways. 
Computers stop unexpectedly, disk heads crash, operators nay 
drop disks, programs may have bugs, and so on. Database 
systems must include not only a variety of checks and 
contrcels te reduce the likelihood of failure, but also an 
extensive set of procedures for recovering from the failures 
Guat wid inevitably occur despite those checks and 
eontrcis. 

In an operational environment there are many fossitrle 
causes of failures, such as: 

- Programming errcers; in an application oor in the data- 
base systen, 

- hardware errors; on the device or the channel or the 
EPun 

- operator errors; such aS mounting a wrong tape, 

- fluctations in the power supply, 

- fire in the computer system room. 

If such errors occur during a database interacticn, the 
databrase can be left in an inconsistent state. Recovery 
software 1s used to restore the database to some rprevicus 


consistent state. 


1. Fecovery via heprocessing 


There are a variety cf recovery alyorithms. The 
Simplest way is to keep back-up a copy of a database. This 
copy is created periodically, once or twice a day. fThen, 
when a failure occurs, the last back-up copy is used to 
restore the database. Any transactions since that copy was 
made are run ajain. This algorithm is called "recovery via 
reprocessing" and it has several drawbacks. BLcCSe,. Cepro— 
cessing transactions takes the same amount of time as 


processing them the first time. This means that one day will 
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ke reguired to recover one day of processing. If the system 
is heavily loaded, the system may never catch up. Second, 
when transactions are processed concurrently, it is impos- 
Sible to guarantee that they can be reprocessed in the same 
order as they were originally processed. For these reasons, 


reprocessing is not a viable ferm of recovery. 


2. Transactions 


The fundamental purpose of the database system is to 
process transactions. A transaction iS a program, or a 
program part, that can read frcem or write into the datatkase. 
It consists of the execution of an application-specific 
sequence of operaticns. These operations can be of five 
types: BEGIN TRANSACTION, READ, WRITE, COMNDa and 
ROLLBACK. All transactions Fregin with BEGIN TRANSACTION 
operaticn. READ causeéS a page or record to be read from the 
database. WRITE causes a new copy of a page or record to be 
written into the datakase. CCMMIT tells the system that the 
transaction has terminated succesfully and that all of its 
updated pages or records should be permanently reflected in 
the datakase. ROLLEACK tells the system that the transaction 
has terminated abnormally and that the records or fages it 
wrote into should be returned to their previous state. A 
transaction can have cnly one COMMIT or ROLLBACK precessed, 


and transactions cannct be nested. 


The cecovery Manager processes the READ, While 


COMMIT, and ROLLBACK commands. It aiso handles system fail- 


ures so it provides reliability for the DBMS. 
3. Recovery via Kollback/Roliforward 


= a <= eee 


This approach uses the Following four step 


algorithns 


WZ 


1. kecreate the outputs of ail successfully ccmpleéeted 
transactions. (Transactions which are ended with 
CCMMIT operation.) | 

Pavone dtl trdnsdctronsm in SEOCess at the time of 
failure. 

3. Remove database changes generated by aborted trans- 
actions. 

4. Restart aborted transactions. 

This algorithaz will appropriately recover the data- 
base. Some transaction outputs cannot be undone. Zhis 
outputs are calied "Real outputs" by Gray in [Ref. 21:pp. 
we 242). Real outputs are messages which are received by 
people who are using the system, like order confirmaticns or 
inputs to other Taouede lone. The message 
UOme Tt CER Ds 9999999), eS ASSIGNED LO THE UNIT, 
rT iDe9 939) ASSLGMMENT ORDER NO IS 99999-9" are examples 
eof real outputs. Because they cannot be undone, real outputs 
should net be produced until the transaction is completed. 
It 1s recommended that a log of real outputs be maintained. 
When the transaction is completed, the actions onthe lcg are 
updated and the real outputs become visible. If a failure 
occurs when the real cutputs are being produced, each output 
could be numbered and a log kept of the real outputs that 


have reen produced. 


To apply UNDO (rolling back a transaction) and REDO 
(cclifcrcward a transaction) processes to a database systen, 
a log should be kept of transaction results. The log 
includes the old and new values of all items updated by the 
thamodCewonm, and it is in chronological order. The tlog 
resides on either disk or tape. When a failure occurs, the 
log is used to both UNDO and REDO transactions, as shcewn in 


Faguresl.2 and Figure 10.3, respectively. 
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Figure 10.2 UNDO Transaction Procedure. 
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Figure 10.3 REDO Transaction Procedure. 


To UNDO a transaction, the log must contain a copy 
of every database reccrd before it was changed. Such records 
are called before images. By applying before images tc the 
database an UNDO prccedure is performed. To REDO a trans- 
action the log must ccntain a copy of every database record 


after it was changed. These records are called after images. 
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By applying after images to the database a REDO procedure is 
performed. Possible data-items of a transaction log are 


shown in Figure 10.4. 


a a 
TLlansaction ID Operation Type | 
Reverse Pointer Object | 

| Forward Pointer Old Value | 

| Tine New Value | 

: 


Figure 10.4 Data-items of a Log Record. 


For identification purposes each log transaction has 
a unigue ID. All images are linked together with a double- 
linked list. These fcrward and backward links can be used by 
the reccvery manager to locate ali records for a particular 
transaction. Other data-items are: the time of the action, 
the type of the operation (modify,insert, etc.), the crject 
such as record type and identifier, and the old and new 


values. 


There 1S an interval between writing a change tc the 
datakase's stable stcrage and writing the log record repre- 
senting that change. These are two distinct operations. This 
fact introduces two guestions: What happens if a failure 
occurs in the interval between these two operations? What 
should be done to avcid improper applications? 

Suppose that in fact such a failure does occur, so 


that cnly one of the writes (the first operation) 1S 


AZ 


performed and the otker is lost. If the performed operation 
is the database write, there will be changes in the database 
that are not recorded in the log so the UNDO process 12 Hex 
possitle for these changes, It is obvious that, for safety, 
the lcg record should always be written first. Therefore we 
1. A transaction is not allowed to write a record tc the 
Stable storage of the database until at least the 
before image cf the log record has been written to 

the physical l1cg. 

.2. A transaction is not allowed to complete CCMMIT 
Frocessing until both the before images and the after 
images of alllog records for the transaction have 
keen written tc the physical log. 

If a failure cccurs, a change may be recorded in the 
log and not in the database. In this case, the reccvery 
manager may attempt to undo changes that have not yet 
cccured. This is not a problem, because the recovery manager 
will cnly be placing tEefore images in the database. Records 
will ke replaced by ccpies of themselves. This is a wasteful 


Operation but not harnetul-. 


C. CCHCURRENCY CONTECL 


Given aocorrect state of the database as input, a 
correct transaction will produce a correct state cf the 
datakrase as output. Even if all transactions are individu- 
ally correct, however, it is fossible in a multiuser systen 
for transactions that execute concurrently to interfere with 
one ancther in such a way as to produce an overall result 
that is not correct. AS an example of that kind of interfer- 


ence, we will consider the "lost update" problen. 


Ze 


1. Concurrent Update (Lost Update) Probien 





The lost update problem can be represented as shown 


ies LgGUre 10.5. 


| Time Transaction A (TA) Transaction B (TB) | 
£1 tv moe cr cH * 
| copy, tuple . | 
| * — | 
EZ 7 copy tuple i from | 
P x re ation 21 
: t3 modify éqnie 2 * | 
and ufdate - { 
* * | 
*x i *x 
tU x modify tuple aL | 
i : and update | 











Pigure 10.5 Lost Update Problen. 


{Transaction Ais intended to change some field F in 
fuple i: lets say will double the value of field F. 
Transaction B is intended also to double the value of that 
same field. Thus, if the initial value of that field is 2, 
then running the two transactions one at a time, without 
concurrency, will produce a final result of 8. However, the 
particular concurrent execution seguence shown in Figure 
10.5 produces a final result of 4. That particular execution 
sequence is therefore incorrect. In this situation, we can 


say that TA's update is lost because TB overwrites it. 


The most common method of concurrency control is to 


use locks. One lock iS maintained for each user. The tern 
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user refers to -the user of DBMS, not necessarily the systen 


user. Thus a user cam be either sagpemson using tie Deis 


queryyupdate facility via a terminal, or an application 
Frogram that calls upon the [BMS for service. A prcgran 
obtains all such locks before Making any updates. 


Concurrency control tust ensure that at most one prcgran 
gets the lock for one database part. Hence, if a progran 
wishes to move a locked item to the program working storage, 
it must wait until the previous program releases the lock. 
The implementation of this prccess differs from one systen 
to-ancther. In many implementations, user programs include 
commands to lock the reguired records before updating then. 


This can be represented pictorially as shown in Figure 10262 


| 
| Tine Transaction A (TA) Transaction E (TBE) | 
| 
* % * 
* * 
ta lock 1. a oeao = | 
1 fron tela 1on R1 : | 
we * attempt - place a | 
* * Leck on ki 
* * wait | 
£3 Nodify tuple a, wait 
| * and update wait { 
* wait 
t4 release R1 eae 
i> * FO Ciren 1. 90 See 
| ; i from mauite ion ki. 
£6 a Mod ale tuple i, and 
| * * update 
* * 
ee) * release R1 | 
| | 
Se ——— ee ee 


Figure 10.6 Resource Locking. 
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It is seen that transaction B is now made to wait at 
time t2, because its request for a lock on R1 at that time 
conflicts with the lcck already held on R1 by transaction A. 
Transaction B resumes after transaction A releases its 
Hock. This kind of lock mechanism will provide a correct 
final result for those two transactions. Lost update prob- 


lems can be solved by the lock mechanisn. 


3. Deadlock 


Locks can introduce the problem commoniy known as 
deadlock. A deadlock occurs when two transactions, say TA 
and TE, each places locks on relations, say R1and R2 
respectively, and then each transaction attempts to place a 
lock on the others already locked relation. The order of 


processing can be as shown in Figure 10.7. 


Tine Transaction A (TA) Transaction B (TB) | 
: ‘ | 
x * x 
t1 lock R1 * 
* x #e | 
* * x 
2 = BOCK RZ 
* i * 

* * 
cS atten ios Eo lace * | 
a loc * | 
* aen = | 
* walt x ] 
ty wait SB a toe lace i 
wait | 
* wait wat | 
* walt walt 


Figure 10.7 Deadlock Problen. 
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Both transactions are then waiting for each other to 
release a lock. In the database environment, the usual step 
to resolve cr "to break" the deadlock is to rollback one of 
the programs. Breaking a deadlock consist of chocsing a 
"victim", one of the deadlocked transactions; and rolling it 
kack. The victim is not necessarily the transaction that 
actually caused the deadlock; it may be the one holding the 
fewest locks, or the one that was most recenly started, or 
the cne that has made the fewest updates. The roliback 
process involves the following jobs: 

-1. Terminate the transaction, victin, and undo all of 
its updates. 
2. Release all the locks of the transaction; resources 


are now allocated to other transactions. 


So far, in our examples, we assumed that the unit of 
locking is the individual record. However, the level of the 
lock can be different in different applications, or in 
different DBMSs. Locks, at the highest level, can te aprflied 
to an entire database. This strategy is used by DBMS fred- 
ucts that invoke tte lock for a short time during the 
processing of a single database request. Locks can also be 
applied, at the lowest level, toa specific field within an 
individual record. In between these extremes, locks can be 
placed on records, cn pages or blocks, and on files. As 
usual, there are tradeoffs among these alternatives. A lcck 
of the entire database is simple for the DBMS to manage. 
However, throughput nay be slow because of less concurrency. 
On the cther hand, locks of small granularity will be 
complex to manaye but throughput will tend to te faster 
because of more concurrency. The choice among alternatives 


depends cn requirements. 
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Be. DATAEASE SECURITY 


The security in database environment 1s protecticn of 
the database against unauthorized disclosure, alteraticn, or 
destruetiom. The subject of database security has many 
different aspects and approaches, Suciwas piySical @erct €e— 
tion, hardware contrcls, uSing passwords, or using authori- 
zation tables. Here we are concerned primarily with 
restricting certain users so they are allowed to access 
and/or mcedify only a subset of the database. 

Gcod security means that people have access to the data 
that they need to accomplish their job function, and no 
more. Jcb functions vary, and for that reason data access 
authorizations will vary. A tatle called authorization rules 


is used for that purpose, and was developed by Fernandez, 


summers, and Yood in [Ref. 22:p. 5]. 


Authorization rules are compiled and stored in the 
system dictionary. First, these rules will be entered into 
the system, tnen they will be enforced. The authorization 
ruiesS compiier and the corresponding enforcement mechanism 
together make up the security subsysten. 

In the applicaticn environment it is convenient to use a 
Matrix fcr authorization rules. The matrix is called an 
autnorization matrix in which rows correspond to users and 
columns correspond tc data objects. The entry A[i,j] crepre- 
Sents the set of authorization rules that apply to user i 
with respect to data cbject j. An example of an authcriza- 


tion matrix is shown in Figure 10.8. 


Sophistication of the security subsystem can be measured 
Ey the granularity of the objects. For example, some DBMS 
systems support authcrization cnly at the level of whole 
Felations, others permit authorization at the level cf indi- 


vidual fields. In otr example authorization is based on the 


a RS 


DATA OBJECT1 DATA OBJEGCT2 DATA OSaReTS 
(UNIT ( COURSES 








| { 
{ OFFICER 
_retetien) .. a ea eee ee 
UsSinn ALL | ALL | ALL | 
(Frown) 
| USER2 NONE NONE ! NONE | | 
(John) | | | 
se S| ee ee ee | | 
USERS ALL READ UPDATE 
(Fersonel 
»~  CLliices) he eee | | 
USERY NONE | READ READ | | 
(Frog-3 ) fo = i. 
READ | | 
Sth ae UPDATE READ ALL 
| (Education | | | 
Cffice ) Vie el | 
| | 
| 


| 


Figure 10.8 An Example for Authorization Matrix. 


hames cr opnjects and not on their value. This ais called 
value-independent control. In this schema, the system can 


enforce the controls without having to access the data 
objects themselves. It is also possible to provide value- 
dependent control in that we can extend the entries in the 
Matrix to include an optional access predicate. FOr 
example, the entry 

SPOIGRISE I ae 

FRECM ChE EGER 


WHERE RANK = ‘'CAET!? 


might be used to allcw SELECT access to some officers and 


not others. 


Ley, 


PitouorezdeLon Lulies Gan also specify that certain field 
combinations are prckibited, even though the individual 
fields within the combination may be accessible. It is also 
necessary to control access to programs. Moreover, itis 
important to control access to the authorizaticn matrix 
itself. 


ies 


Information 1S a k-asic resource, like people or mcney, 
for an enterprise, and it should have a professional manage- 
ment group that is responsible for its effective use 
throughout the enterfrise. For achieving this task, a new 
stafi function called information resource management (IRM) 
has been proposed. This function, in most cases, should 
establish policies and procedures to guide users, system 
developers, and managers so that their decisions will be 
consistent and compatible and employ the best in currently 
availakle technology. In a DBMS, this function is referred 
to as Database Administration (DBA). [Ref. 15:pp. 168-183] 

Cn the other hand, the personnel administration function 
for managers of an organization must have complete ccntrol 
over evaluating, assigning, and firing their own employees. 
In order to perform this task satisfactorily and eifec- 
tively, the managers have to make their own decisions very 
accurately. Sometimes, they are forced to make such deci- 
Sions in a short pericd of time. Those factors in a powerful 
personrel management can be provided by having ae well- 
designed personnel database and a suitable DBMS. 

James P. Fry and Edgar H. Sibley state in their 1976 
paper [Ref. 23] that the objectives of database management 
are: 

-tc make an integrated coliection or data available to a 
wide variety of users (data availability), 

-tc provide for quality and integrity of the data (data 
guality), 

-to insure retention of privacy through security meas- 


ures within the system (privacy and security), 
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-~to allow centralized control of the database which 
is necessary for efficient data administration 
(Management contrcl), and 

-to provide a high degree of data independence. 
Considering those major objectives and some advantages 

euch aS Siupircity, ease of use, data independence, and 
theoretical foundation, the relational database mcdel has 
been found to be convenient in designing such a perscnnel 
database system. The other database models are more complex 
and mcre difficult tc implement. 

After the organization's requirements are understood, 
the process usually kegins by choosing the data model that 
seems most appropriate and then proceeding to a detailed 
evaluation of only tke available DBMS products that support 
the selected model. This is the problem of choosing a DBMS. 
Several committees are working on this froblem such that all 
DBMS's provide the same functions and the same interfaces 
will be standard. In this thesis, the ORACLE DBMS is used to 
Show the inplementaticn stage of the personnel database. 

In conclusion, it is useful to emphasize that it is not 
only important to design an efficient database for an enter- 
prise but also it is required to maintain and develor the 
database by permanently monitoring its performance to 
maximize efficiency as a final operational responsibility of 


the datakase administration (DEA). 
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APPENDIX A 
SEMANTIC DATABASE DESIGN 


The detailed description of the Semantic Datatase Model 


(SDM) design for the Personnel Database which is mentioned 


in Chapter V is shown below. 
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| OERZCER 4 


Hoaitary ID 
description 
value class 
mandatory 


Rank 
value class: 
value class: 


Name 
value class: 


Birth date 
value class: 


Bec wer ee date 


value class: 


Native country 
value class: 


Sex 
value class: 


Marital status 
value class: 
value class: 
value class: 
value class: 

Primary branch 


descripticn: 
value class: 


descripticn: 
value class: 


value Class: 


i mm a a a, i _ RR em mM RR ARRRN —_  g AAI r —p _—aM ——a C —  fp — a A  ——— aagy, 


nultivalued 


Figure A.1 STH Design for Personnel Database. 


Current_address 


description: All officers who are on active-duty. 


member attributes: 
ate e humber for each officer 


not changeable 


RANK 


Date_of_promotion ° 


DATE 
PERSON NAMES 


DATE 


_to_active-duty 
eSCription: Date of first day of being on 


active-duty. 
DATE 


COUNTRY 
SEX 


MARITAL_STATUS 


Number _of children 


INTEGERS 


Permanent address 


ADDRESS 


ADDRESS 


BRANCHES 


Secondary _Eranch 


BRANCE@S 


Academic _education 

ACADEMIC MAJOR 

match : Academic_kranch/Academic_degree of 
ACADEMIC MAJOR on AID. 


Ue) 7, 
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Figu 


_ 


Military _education/courses 
value Class: Seep EDUCATION /COURS ES 
Match : Course/Mil AAT school code of 
; ILITARY EDUCATION/COURSES on MEID. 
multivalued 


Health condition 
value class: MEDICAL INFO 
match : General _ health _ status of 
MipreGhpeiNrco on Haye 


Foreign_language capakilit 
value class: FOREIGN. oa UAGE 


match : Foreign lan e of 
: FOREIGN_ ‘LAN UA E on fae 
multivalued 


Unit_assigned ; ' 
deScripticr: Units which the officer_has teen 
assigned until current date. 
Vaeuliccurais =: I ; 
inverse : Officer_assigned 
multivalued 


identifiers: 
Mariel tar y ae 


ACATEMIC MAJOR 


description: pe of academic branch,the degree 
arned for that branch, “location 
and nane of the universty which 
the officer attended. 


member attributes: 


Academic_branch 
descripticn: Branch such as Computer Science, 
Electrical GO bee a0 
value class: ACADEMIC_BRANCHES 


Academic_degree 
descripticn: Degree such as Bachelor of 
SCL ORC Ete S of Science, 
oD ae Doctorate. 
value class: Die ery GREES 
AID ae : 
description: pee ey of the officer who 


earned at degree. 
value class: MID 
mandatory 
Pare 


descripticn: Date at which the degree earned 
value class: DATE 


re A.2 SDM Design for Personnel Database (cont'd.). 
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Figure #.3 


Name o cpentee ge el 
value class: UNIVERSITY_NAMES 


Location_of_ university: 
value class: COUNTE 


1Gemntisatiers: 


Academic_branch + Academic_degree + AID 


MILITARY _ EDUCATICN/COURSES 


description: Information about the military 
school daeuated Or military course 
attended, location of school, grade 
and date of graduation. 


member attributes: 


COumse, 11 litaly =senoo: Code 
Vee Clal<o;  “COURS&/5CHOOL CODE 


Location 
value class: COUNTRY 


MEID 

feseurpe Toni ctarvelp OL the officerewho 
attended the school or course. 

value class: MID ; 

mandatory 


Course/Schocl title 
vole Class: COURoss/ScCnooL TLILES 


Description 
qGesCLmpercn: dJéxtual ore 
value class: COURSE/SCH 


Duration. _— 
descripticn: Duraticn of the military 
education in weeks. 
Value class: INPSGERS 


Date 

desecriptien: Guadwation @ate of an officer 
from the course or school. 

value class: DATE 


Grade 
Gescripticr: Grade earned for that course 
or military education. 
value class: COURSE_GRADES 
identifiers 


Course/Military school_code + Location + MEID 
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SDM Design for Personnel Database (cont'd.). 
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Figure A.4 


MEDICAL INFO 


description: 


Medical information and overall 


medical status for an officer. 


Medical _repert 


descripticr: 


value class: 


el s070 ee. 
descripticn: 


value class; 
Mandatory 


Date : : 
descripticn: 
value class: 


Height ~~. 
description: 
value class: 

Weight  —s_, 
desGri pticn: 
value Class: 


* 


Blood _ pressure 
value class: 


Eye conditicr 
d@sGriptien: 


value class: 


EdbecondiveaGn 
déscripticn: 


value class: 


inter nals 
descripticn: 


value class: 
General health 


description: 


value class: 


member attributes: 


number 

Medical report number of last 
eieean td OE SURG Omircer . 
REPORT NUMBER 


Military ID o@eeherotricer™to 
oom the information belongs tc. 


Date of the report. 
DATE 


Height of the officer. 
HEIGHT 


Her ee Of the officer. 
WEIGHT 


BLOOD_PRESSURE 


Descrikes the condition of both 
eyes of the officer. 
EYE CONDITION 


Descrites the condition of bcth 
ears of the officer. 
EAR CONDITION 


Describes the condition of. 
internal organs of the officer. 
INTERNAL CONDITION 


status ; 

An overall evaluation of 
conditions of all bod arts. 
This status is descrited by some 
member attribute values of this 
entity class. 


HEALTH STATUS 


SDM Design for Personnel Database (cont'td.). 
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Figure A.5 SDM Design for Personnel Database (cont'd.). 


identifiers: 
Medical _report_number + HID 
FOREIGN LANGUAGE 


description: It is used to define the officers 
creign language capability. 


member attributes: 


Foreign_language 
value class: LANGUAGES 


FID 

descripticn: Military ID of the officer who 
has the Tanguage capability. 

value class: MID 

mandatory 


Deyree_of Pay Sea 
value class: LANGUAGE _CAPABILITY 


identifiers: 


Foreign_language + FID 


UNIT 

description: L[escription of a unit. Unit code, 
unit hame, unit categori,locaticn, 
Supers unit,unit status and 

officers assigned to the unit. 


member attributes: 
Unit _code 
vaiue class: UNIT_CODE 
Mandatory 
not changeable 


Nane 
value class: UNIT_NAHES 


Unit_category 
description: Organizational level of unit 
such as corps, tcrigade, 
divisicn. 
value class: UNIT_CAT 


Location. 
desemrpticns Location Of unit. 


value class: UNIT_LCCATION 
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| 
| 
| 


a 





SU ea ; 
escription: The unit which has command and 
control of this unit. 
value class: UNIT 
| 
| 


Uniteeunetaen 
deScripticn: eyRe of function or service which 
the unit performs. 
value class: UNIT _FUNC 


Officer_assicsned__. 
description: Officers who are assigned to this 


unit. 
value class: OFFICER . 
inverse 7 Unie assigned 


multivalued 
identifiers: 


Unit_code 


ASSIGNEE Te REQUEST 


description: The reguest which is made by any unit, 
aktout officers who have certain. 
Specifications Ei t stomeas specific 
position to be assigned. 


memper attributes: 


Unit_code . ; ae 
deScripticn: The unit who is issued the 
request for assignment. 
value class: UNIT CODE 
mandatory 
not changeable 


a UO : ; 
escripticn: A number which is given by the 
unit who 1S issued the request. 
value class: REQUEST_NO 
mandatory 
not changeable 


Date 
value class: DATE 


Rank 
descripticn: Rank of the officer who is 
Eeques tee for assignment. 
value class: RANK 


Primary _branch_requested 
value class: BRANCHES 
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Figure A.6 SDM Design for Personnel Database (cont'd.). 
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Figure A.7 


Secondary -Eranch requested 
value class: BRANCHES 


Acadenic_major_reguested. 
descripticn: Academic major and degree for 
this assignment 
value class: ACADEMIC_MAJOR 
multivalued 


Military _course/education_regquested 
descripticn: Military education and/or course 
- , which is needed for this 
Soa es: 
value class: MILITARY _EDUCATION/COURSES 
multivalued 


Medical status 
descripticn: Lowest value for medical status 
which is needed for this 
assignment. 
value class: HEALTH _STATUS 


Number_of_ person ; 
description: Number of officer requested with 
this assignment request. 
value class: INTEGERS 


class attributes: 


Number_of requests 
descripticn: The number of requests that 
j issued in the current year. 
derivation: Number of members in this class 
which Date= current year. 


identifiers: 


Unit _code + Request_number 
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SDM Design for Personnel Database (cont'd.). 
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: 


MID 
interclass connection: subclass of STRINGS where 
format is 5 digit numbers 


RANK 
interclass connection: subclass or STRINGS where 
specified 


TATE 
interclass connection: subclass of STRINGS where 
format is: 


month: number where 21 and <12 


day: humber where integer and 21 and $31 


year: number where integer and eat 208 
where ja5 een =4 or =5 or =9 or =11 

S S0peand. (if omen =2 then day ~<29) 
ordering Thy year, month, day 


EERSON NAMES 
interclass connection: subclass of STRINGS 


COUNTRY : 
interclass connection: subclass of STRINGS where 
specified 


Sia 
interclass connection: subclass of STRINGS where 
format is 1 character: a, 


MARITAL STATUS 
interclass connection: subclass of STRINGS where 
format is 1 character: S, M, D, 


ADDRESS 
interclass connection: subclass of STRINGS 


ERANCHES ; 
interclass connection: subclass of STRINGS where 
specified 


ACALTEMIC BRANCHES 
interclass connection: subclass of STRINGS where 
specified 


ACADEMUMG DEGREES 
interclass connection: subclass of STRINGS where 
values are: BA, BS, MA, MS, ENG, PhD 


UNIVERSITY NAMES 
interclaSs connection: subclass of STRINGS 


COURSE sGne Ore eOlE 
interclass Connection: subclass o£ STEKINGS where 
format is 5 characters 


COURSE /sGH@Olm TIILES 
interclass Connection: subclass of STRINGS 


ee ti mm 9 am en epee pn, een ag cig a a gy my ee iy a me ts a el ee SRS any a en ene eine api ee 
. 


fo 


Figure A.8 


Domains of Attributes. 
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COURSE/SCHOOL_DESCRIPTION 7 


interclass Connection: subclass of STRINGS 


COURSE GRADES 
interclass connection; subclass of STRINGS where 
format is z characters 


REPORT NUMBER 
interclass connection: subclass of STRINGS where 
format is 6 digit number 


HEIGHT 
interclass connection: subclass of STRINGS where 
format is fositive integer 


WEIGHT 
interclass connection: subclass of STRINGS where 
format is fositive integer 


ELOCD PRESSURE 
interclass connection: subclass of STRINGS where 
Specified 


EYE GCOnNDETION 
interclass connection: subclass of STRINGS where 
specified 


FARK CCNDITION 
interclass connection: subclass of STRINGS where 
specified 


INTERNAL CONDITICN. 
interclass connection: subclass of STRINGS where 
specified 


EFALTH STATUS 
Tievenemass Connection. subclass of STRINGS where 
format is 2 digit number 


I ANGUAGES 
1pterclass connection: subclass of STRINGS where 
specified 


LANGUAGE CAPABILITY 
interclass connection: subclass of STRINGS where 
format is 1 digit number 


Ui ECVE 
interclass connection: subclass of STRINGS where 
specified 


UNIT NAMES 
interclass connection: subclass of STRINGS 


GNIT CAT 
interciass connection: subclass of STRINGS where 
couna t Ps Seehbaeaerers Gon mb V, BL, REG, 
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Figure A.9 Lomains of Attributes (cont'd). 
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UNIT LOCATION 
interclass connection: subclass of STRINGS where 
specified 


UNIT FUNC 
interclass connection: subclass of STRINGS where 
format is 6 characters 


REQUEST_NO 
interclass connection: subclass of STRINGS where 
format is 6 digit number 
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Figure A.10 Domains of Attributes (cont'd). 
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SAMPLE RELATION 


Thaseieeen dix 


DataLase, which are used to implement. 


this datarase is based on these sample 


UFT> SELECT 
2 FROM OFFICER; 


MID RANK JNAME 
27363 ¢caot Jonnson 
12239 naj cternandez 
32458 i1t Roobdins 
493596 21e Smith 
10999 leolt 3rouwn 
35758 Ite Sreenoerg 
29364 canoer Janes 
167485 naj Lerygnton 
10792 col Stone 


9 records se#lecter. 


2. Eelation 


UFI> SELECT » 


2 


UCODE 
O1DI1V 
C93RG 
S4DEP 
12HOS 
O2AVI 
07SGP 
20A8T 
O3E3N 
0201 Vv 


FROM UNIT; 


UNAME 


lst Inf Div 
9thm Art Berg 
64th Iria Deoot 
2th Fd Hoso 
2nd Avia Unit 
Ptnesce Fes Go 
2Otn Airdef 3t 
3rd Eng Bn 

end Inf Div 


9 records selected. 


SEX PRITe3RAN SECe*dRAN 


J ads advoeaesd 3 


UC AT 
div 
org 
deo 
hoso 
reg 
ot 
ot 
bn 
div 


Shcews sample 


12 
oo 


PERSONNEL DATABASE 


relations of 


JITAERS 


artill Sip ot 
inftry soefe 
arirdaef ado 
REC oiltoe 
iInftry ado 
sijecsr Omlot 
nileag soefc 
fy 4aqc ado 
oritnan artill 
JLIE 


Pt. .Rilev,KS 
PreRilteveKS 
Fe. Riley,KS 
2 « oi ley,-KS 
Fe .Gardon,Ga 
Pt .8raqq,Nc 
Pe stood, Ix 
Fe eHs5oqa, [Xx 
Pt.hood, bx 
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SUeeuyIT JPUNC 


OFARM conoa 
Q1D Vv eConsu 
O1O0lVv conse 
O1D0T¥v conse 
O80! Vv comda 
820IV conpba 
020Iv consu 
O20lVv candDa 
O07AR™ CcCombdDa 


the Perscnnel 


The i1mplementaticn of 


relations. 


UFI> SELEGIms 
2 FROM AeYE UCATION; 


ABRAN ADEG 


AID 


Qqe =z «= es @@@ @& Se @2@@ @ @& 


math 8S 
law BA 
ee BS 
dent 8S 
angt BA 
es MS 
ee 8S 
cons 8S 
ee MS 
magt BA 
me 8S 
ee 4S 


27363 
12239 
32458 
43596 
10999 
10999 
35768 
29364 
29364 
16745 
10792 
10792 


UNIV 


JCLA,C4 
Purdue,IN 


Seattle,VA 
Berkeley,CA 


LoyolasIkl 
NPGS,CA 
Rice,ITx 
MIT,MA 
dhio-rOn 


Cornelis NY 
RutgerssNJ 


NPGS,CA 


12 records selected. 


UP I> SELE Gres 
2 FROM Me*EQUCATION 


3 ORDER 8Y CCODE4,CGRADE; 


CCODE4 


li records selected. 


MET) 


CGRADE CDATE 


SDATE 

06*AUGS75 
30-JSUL=70 
21-SEP-80 
25-MAY=d0 
O1-JUL -68 


-O07*0EC=73 


Ob-MAY=-80 
O9“DEC=7A 
10-JUL<-85 
SU=5EP=75 
31-AUG=65 
O1-MAR-70 


l2e-JUL<84 
23-NO0V=82 
13<APR=77 
26°0CT=<84 
ZIl\eJAN=77 
22-JUN=82 
22°NDV-76 
1120C1-70 
30-SE°=72 
01-DEC-69 
2O-FEB-82 


14g 


WT> SELECT ¢ 
2 FROM MeCOURSES; 


CCODES CLOCS CTITLE 
asoo3s Fe.Sitl),3K artillery 
AAL1O02 Fe.RucKcer, AL aviattor 
AD002 Fe.Sliss,1x airdet 
CS$302 Yonteray,ca aco 
C€$$09 Fe.marrison,IN ado 
Ta076 Fe. .Aarrisaon,IN aanin 
IS005S Fer.8enning,Gi infantry 
4S$706 Fet.Houston-IX nealtn 
OC092 Aberstesn,“0 ordthes 
$$S002 Ft .Gorsen,Ga signal 
10 reeords selected. 

6. Relation LANGUAGE: 


UFI> SEGQECT = 
2 FROM LANGUAGE; 


NLANGUAGE 
gerran 
french 
russian 
korean 
Qeraan 
turkiga” 


6&6 records 


PIO LDEGREE 


27303 5 
12239 q 
12239 7 
157455 2 
35758 > 
32458 4 


selected. 


GBESC 

Fiels Artillery Sersgol 
aviation Senao0l 
Army Aie Defense Seno01 
MeaD Officer Course 

Ago Officer Course 
Ins.for Agministeration 
Seay Infantry Serool 
Aca. af stealen Sei. 

Aray Ordg.eang Cres.Seno0} 
Army Signal Senrool 


Aray 
Aray 


1d 


COUR 


& 
OD 


oe Ota in O 


ving Oo € Wa m — ww WI 
& Oo 


co 


UF I> SECECT is 
2 FROM MEDICAL; 


REPNO 


983817 
.1328225 
245826 
37584 -1 
4B3B84~5 
1288504 
20985 <6 
24580-0 
37580<8 
25681 <2 


HIO 


ess) 
32458 
43596 
27363 
35768 
29364 
16745 
10999 
10792 
27363 


ROATE 


geee w@e@oae@ @& @S* Gee ea @@e @@eeoa @@e 


30=NOV=-81 
01-4AR=82 
30-AUG=82 
O7=JUN\84 
31-4UG=84 
09—-4AR<85 
14-APR-BS 
17*NOV=80 
06-DEC=80 
04-OCT-81 


10 records Selected. 


8. Relatio 


UFIT> SELECT » 
2 FROM ASSIGNMENT; 


AMID 
10792 
10999 
16745 
27363 
29364 
le239 
32458 
43596 
35768 
10792 
10999 
10792 


EYECOND EARCOND 


0 
Ne 
0 
It 
‘10 
0 
Lt 
ee 
45 
0 


Acs DENN ENT: 


AeUCODE ORDERNO 


O9BRG 
O10fVv 
O10{fV 
098RG 
64DEP 
O20IVv 
2O0ART 
12HOS 
O1O0lv 
O92DIV 
O2DIv 
64DEP 


038165S<1 
327967238 
YS6277=3 
32157827 
59387945 
491373=-6 
4B 2683-2 
321782<4 
152282<9 
324871245 
11827323 
324678} 


12 records Selected. 


ASGDATE 


OLeSEP 66 
CUMSEP -69 
C9-DEC -77 
L2°SEP=78 
OB =-MAR=-79 
30-OCT-73 
LO“JAN=83 
11-AJG=82 
OL-APR—-B2 
L1eJAN=71 
30“-ANG 73 
15<APR=-78 
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— 


™~ 
ON Of OOO - © 


HSTAT OTHERS 


oC UN On - =— Ww Oo 


9. Relation ASGKEQ: 


= SS oS eee — ae SS a 


UF I> SELECGH a 
2 FROM ASGRED; 


ReUCONE REQNUM REQNATE ReQANK RePRIBR ReSEC3R ReACABR ReMILED ReniSTAT 


@2eeee ae @eee@eqeee @Z{ea aagee@eees & @2a @@@ @& @e2@e ee @oe@q@ @2 @2@e @@e @ @ @2{2eeoe@ 2 @ @2e ea @@ @ @ 2 ae @@e @ @& 


NUMOFPERS 

1240S i halal 30“DEC $84 naj financ ado magt [A076 4 
20ABT cael D6-MAR=85 chor artill pitor ee AS003 2 
20SGP 327685294 2TH-APRHBS naj inftry soefce law {soos 0 


1 


laa 


fle 


12 


Joie 
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